University of Phoenix's Virtual Instructor's Manual
Introduction to Operating Systems
Section Four UNIX and LINUX
- List of Objectives
- Introduce the UNIX System
Although operating system concepts can be considered in purely theoretical terms, it is often useful to see how they are implemented in practice. This section presents an in-depth examination of the 4.3BSD operating system, a version of UNIX, as an example of the various concepts presented in this book. By examining a complete, real system, we can see how the various concepts discussed in this book relate both to one another and to practice.
This UNIX operating system was chosen in part because at one time it was almost small enough to understand and yet is not a "toy" operating system. Most of its internal algorithms were selected for simplicity, not for speed or sophistication. UNIX design has had a profound influence on many modern operating systems. One interesting point is the simplicity between UNIX and MS-DOS. Spend some time discussing the difference between what really happens and what seems to happen. To the user, it’s apparent environment that is most important; the implementation details are transparent.
The replaceable shell is another important UNIX innovation. Relate UNIX time-slicing, interrupts, and memory management to traditional time-sharing and virtual memory system; it’s important for students to realize that even an advanced operating system like UNIX has its roots in basic concepts. The UNIX text segment represents an excellent opportunity to illustrate the value of reentrant code.
The file system is regarded as the key to UNIX. In addition to the obvious pointers and links, spend some time talking about how all block I/O takes place through a buffer pool. One key concept is asynchronous I/O; for many students it’s a new idea.
I have found that identifying and diagramming an operating system’s key functions, control blocks, and pointers is an excellent way to grasp the operating system. (A picture really is worth a thousand words.)
- Introduce the LINUX System
LINUX is another UNIX-like system that has gained popularity in recent years. This section discusses the history and development of Linux and covers user and programmer interfaces that Linux presents — interfaces that owe a great deal to the UNIX tradition. Emphasis is given to the internal methods by which Linux implements these interfaces. However, since Linux has been designed to run as many standard UNIX applications as possible, it has much in common with existing UNIX implementations. You do not need to duplicate the basic description of UNIX given earlier in this section.
Linux is a rapidly evolving operating system. It is gaining in popularity and there are several versions which are commercially available. Among these are Red Hat LinuxÔ and Apache LinuxÔ , both having web sites, which provide complete and current information.
- Chapter Summary
As previously indicated, this section of the textbook consists of a two-part discussion covering UNIX and LINUX. Today, UNIX is a standard, particularly in the academic world, and is available on a variety of machines. More important is the impact UNIX has had on the design of other operating systems. For example the current versions of Windows clearly reflect the UNIX influence and Linux is a UNIX clone used to run many standard UNIX applications. Refer to UNIX as a time-sharing system, with program segments swapped in and out of memory as required. Present both operating system in a sequential fashion so that students gain a firm understanding of UNIX key functions, control blocks, tables and pointers and use this information to explore the subtle differences within the Linux system. The Linux system is a product of the Internet so be sure to have students review key site references listed under the Biographical Notes on page 211.
- Problems for Faculty to Assign or Work Through in Class
- Unix is highly portable. What is portability? What makes UNIX so portable? Why is portability important?
Answer: A portable operating system can be moved between incompatible computers with little or no change. Unlike most operating systems, UNIX is largely written in a compiler language, C, and thus can be ported to any computer that has a C compiler. Because application programs interface with the operating system (rather than with hardware), a UNIX program can run on any computer that supports UNIX.
- What are the major differences between 4.3BSD UNIX and SYSVR3? Is one system "better" than the other? Explain your answer.
Answer: 4.3BSD includes several features not found in SYSVR3, including long file names (up to 254 characters), the Berkeley File System (faster file access and more robust), symbolic links (soft pointers to files), processes having multiple access groups, and job control (easy per-job multiprocessing). SYSVR3 has Streams (a multi-layered communications facility). Neither is "better" per se, but BSD does have more features.
- How were the design goals of UNIX different from those of other operating systems during the early stages of UNIX development?
Answer: Rather than being a market-oriented operating system, like MULTICS, with definite goals and features, UNIX grew as a tool to allow Thompson and Ritchie to get their research done at Bell Labs. They found a spare PDP-11 system and wrote UNIX to help them with text-processing requirements. It therefore exactly suited their personal needs, not those of a company.
- Briefly explain the UNIX swapping process.
Answer: Each time UNIX gets control, the swapping process searches for a ready process that has been swapped out. If it finds one, it swaps that process in. If necessary, to make space for that, another process is swapped out. Processes waiting for relatively slow events are primary candidates to be swapped out. Main memory residency time is also a consideration. Main memory residency is also a consideration.
- What are the advantages and disadvantages of writing an operating system in a high-level language such as C?
Answer: C makes UNIX highly portable, as evidenced by the many systems it runs on. It is also (arguably) faster to write and debug code in a high-level language, allowing UNIX to be modified more quickly than assembly-language-based operating systems. Of course it runs less efficiently than if it had been written in assembly language, like most other operating systems. It is generally larger than assembly-language operating systems too.
- Why are there many different versions of UNIX currently available? In what ways is this diversity an advantage to UNIX? In what ways is it a disadvantage?
Answer: AT&T made the source code of UNIX available to universities and other sites, where experimentation and expansion took place. This allowed many people to have an influence on UNIX and to try out their own ideas. These ideas were circulated and the best ones were culled for inclusion in the standard varieties of UNIX. The disadvantage this causes is that there is no "standard" version of UNIX. Programs written for UNIX may only run on one, or some, versions of UNIX, but rarely all.
- Early UNIX systems used swapping for memory management, whereas 4.3BSD used paging and swapping. Discuss the advantages and disadvantages of the two memory methods.
Answer: When a CPU is slow, compared to its backing store, swapping makes sense. The CPU can issue one transfer command and the I/O system can move an entire process into or out of main memory. As CPUs get faster, paging makes more sense. The CPU has more time to decide which pages are not being used and to issue transfer requests. Paging generally requires "smarter" hardware, with access bits for each page of memory, or at least invalid page bits. Swapping wastes memory due to internal fragmentation. Even on paging systems, swapping is useful when thrashing is occurring due to too many active processes touching too many pages.
- Describe the modifications to a file system that the 4.3BSD kernel makes when a process requests the creation of a new file /tmp/foo and writes to that file sequentially until the file size reaches 20K.
Answer: Let’s assume that the block size is 4K. The kernel receives a creat or open system call (with the "create" flag set). It locates the directory in which the file is requested to be created and verifies that the process has write permission in that directory, and that no file exists with that same name without write permission. It locates the cylinder group that contains the directory, and finds a free inode in that cylinder group is there is room; if not, it does a "long seek" to a nearby group that has room.
It allocates the inode by removing it from the free inode list. It then modifies the free inode to show that it is used and updates all the appropriate fields (write date, size = 0, owner and group, protection, etc.). The system then creates a new directory entry in the parent directory’s data area that has the name of the new file and a pointer to its newly allocated inode. The inode is then placed in the per-process table of open files and its file pointer is set to 0. The kernel’s file-structure table and the in-core inode list are updated too. The directory entry is then written to disk to assure that directories are always consistent. The system then receives "write" system calls until 20K of data are received. If the caller is efficient, the writes will occur in 4K chunks (the size of a complete block). If this is the case, the system locates a free block in the cylinder group and changes the free block bit map to show the block is in use. It changes the inode such that the next free direct block is changed to have the value of the disk block. So the first write of 4K would allocate the first direct block, the second write the second block, and so on. These writes are buffered in the block buffer cache until the system deems it necessary to write them to disk.
If writes are done in other than 4K increments, the system must allocate fragments of 1K to handle any writes that do not end at a 4K increment. Each following write would require the system to copy the data in any fragments left by last write into a new block, and start appending the new data there. This is very inefficient obviously (2 reads and a write per write). Fortunately the disk buffer cache alleviates some this overhead by not writing data immediately to disk.
- What effects on system performance would the following changes to 4.3BSD have? Explain your answers.
- The merging of the block buffer cache and the process paging space.
- Clustering disk I/O into large chunks.
- Implementing and using shared memory to pass data between processes, rather than using RPC or sockets.
- Using the ISO seven-layer networking model, rather than the ARM network model.
- Such a merge was one in SunOS 4.1. The result is a more general model of memory use. If lots of file transfers are occurring, more memory is used to hold data blocks. If more processes are executing, more storage is devoted to paging.
- Another change to SunOS. This change resulted in more efficient use of the disks in the system — larger chunks of data are transferred with fewer seeks.
- More efficient data transfer between communicating processes.
- Less efficient network use, as a packet spends more time traversing the network protocol stack before and after being transmitted on the network.
- In Linux, shared libraries perform many operations central to the operating system. What is the advantage of keeping this functionality out of the kernel? Are there any drawbacks?
Answer: There are a number of reasons for keeping functionality in shared libraries rather than in the kernel itself. These include:
Reliability. Kernel-mode programming is inherently higher risk than user-mode programming. If the kernel is coded correctly so that protection between processes is enforced, then an occurrence of a bug in a user-mode library is likely to affect only the currently executing process, whereas a similar bug in the kernel could conceivably bring down the entire operating system.
Performance. Keeping as much functionality as possible in user-mode shared libraries helps performance in two ways. First of all, it reduces physical memory consumption: kernel memory is non-pageable, so every kernel function is permanently resident in physical memory, but a library function can be paged in from disk on demand and does not need to be physically present all of the time. Although the library function may be resident in many processes at once, page sharing by the virtual memory system means that it is only loaded into physical memory at most once.
Secondly, calling a function in a loaded library is a very fast operation, but calling a kernel function through a kernel system service call is much more expensive. Entering the kernel involves changing the CPU protection domain, and once in the kernel, all of the arguments supplied by the process must be very carefully checked for correctness: the kernel cannot afford to make any assumptions about the validity of the arguments passed in, whereas a library function might reasonably do so. Both of these factors make calling a kernel function much slower than calling the same function in a library.
Manageability. Many different shared libraries can be loaded by an application. If new functionality is required in a running system, shared libraries to provide that functionality can be installed without interrupting any already-running processes. Similarly, existing shared libraries can generally be upgraded without requiring any system downtime. Unprivileged uses can create shared libraries to be run by their own programs. All of these attributes make shared libraries generally easier to manage than kernel code.
There are, however, a few disadvantages to having code in a shared library. There are obvious examples of code, which is completely unsuitable for implementation in a library, including low-level functionality such as device drivers or file systems. In general, services shard around the entire system are better implemented in the kernel if they are performance- critical, since the alternative—running the shared service in a separate process and communicating with it through inter-process communication—requires two context switches for every service requested by a process. In some cases it may be appropriate to prototype a service in user-mode but implement the final version as a kernel routine.
Security is also an issue. A shared library runs with the privileges of the process calling the library. It cannot directly access any resources inaccessible to the calling process, and the calling process has full access to all of the data structures maintained by the shared library. If the service being provided requires any privileges outside of a normal process, or if the data managed by the library needs to be protected from normal user processes, then libraries are inappropriate, and a separate server process (if performance permits) or a kernel implementation is required.
- The Linux source code is freely and widely available over the Internet or from CD-ROM vendors. What are three implications of this availability on the security of the Linux system?
Answer: The open availability of an operating system’s source code has both positive and negative impacts on security, and it is probably a mistake to say that it is definitely a good thing or a bad thing.
Linux’s source code is open to scrutiny by both the good guys and the bad guys. In its favor, this has resulted in the code being inspected by a large number of people who are concerned about security and who have eliminated any vulnerabilities they have found. On the other hand is the "security through obscurity" argument, which states that attackers’ jobs are made easier if they have access to the source code of the system they are trying to penetrate. By denying attackers information about a system, the hope is that it will be harder for those attackers to find and exploit any security weaknesses, which may be present.
In other words, open source code implies both that security weaknesses can be found and fixed faster by the Linux community, increasing the security of the system; and that attackers can be more easily find any weaknesses that do remain in Linux.
There are other implications for source code availability, however. One is that if a weakness in Linux is found and exploited, then a fix for that problem can be created and distributed very quickly. (Typically, security problems in Linux tend to have fixes available to the public within 24 hours of their discovery.) Another is that if security is a major concern to particular users, then it is possible for those users to review the source code to satisfy them of its level of security or to make any changes if they wish to add new security measures.
- What is the shell? What is the kernel?
Answer: The shell is the UNIX command processor; users and programmers communicate with the operating system through the shell. The kernel is the portion of the operating system that holds machine-independent code.
- Explain the difference between absolute path names and relative path names.
Answer: Absolute path names start at the root of the file system and are distinguished by a slash at the beginning of the path name; /user/local/font is an absolute path name. Relative path names start at the current directory, which is an attribute of the process accessing the path name.
- Describe a distinct similarity in the file management systems of MS-DOS and UNIX.
Answer: Files are organized in tree-structured directories. Directories are themselves files that contain information about on how to find other files.
- In what circumstances is the system-call sequence for execve most appropriate? When is vfork preferable?
Answer: Since vfork is a fairly dangerous system call, it should only be used when a large process needs to be started. For small child processes, the fork execve call sequence is almost as efficient and does not allow its address space to be affected.
- Briefly explain how processes are created under UNIX.
Answer: The fork utility creates a duplicate process. After executing a fork, the parent normally waits for the child process to die. The child process, which gets a different return code, normally calls exec, which overlays new text and data segments on top of the original process.
- Briefly explain UNIX dispatching.
Answer: The death of a process generates an event signal. The event-wait routine gets control, marks all processes waiting for that event ready, searches the table of ready processes, and gives control to the highest priority ready process.
- Distinguish between an event and a process.
Answer: An event generates an electronic signal that is acted upon by the operating system. The event occupies no main memory. A process is the execution of an image; in other words, a program. It occupies memory and exists for a measurable amount of time. An event exists only for an instant.
- The UNIX buffering scheme makes pipes easy to implement. Explain.
Answer: Since all block I/O takes place through buffer, changing the system input or output device consists of nothing more than changing the source or destination of a buffer.
- Briefly explain how UNIX converts a file name to the file’s location on disk. Why is the system file table necessary?
Answer: When a file is opened, UNIX uses the working directory’s i-number to begin its search for the file. Recorded in the directory, along with the file’s name, is the file’s i-number. That i-number points to an inode that holds the table allows UNIX to keep track of multiple processes accessing the same device or the same file.
- Briefly explain how UNIX uses a process structure to find the various parts of a process. Emphasize with students that every process has both a user and a system phase. Most ordinary work is done in user mode, but, when a system call is made, it is performed in system mode. See Figure 4-6. It might be useful to go over it in class.
- A UNIX user communicates with the operating system through the
- device driver
- text segment
- Which UNIX segment is reentrant?
- system data
- UNIX processes are created by a system primitive called
- Under UNIX, the death of a process generates a(n)
- page fault
- event signal
- Free blocks on a UNIX disk are tracked in the
- super block
- The disk addresses of the blocks associated with a single ordinary UNIX file are listed in a(n)
- super block
- Unix block I/O is
- none of the above
- Both user-written and systems programs are normally executed by a
- graphic user interface
- command interpreter
- argument list
- A _________ is used to carry the data from one process to another
- shell script
- command interpreter
- spool activity
- Under UNIX the execution of an image is called a
- Under UNIX all block I/O takes place through a buffer pool.
- The UNIX swapping process is part of the kernel.
- The standard UNIX shell can be replaced by a custom shell.
- The UNIX fork utility overlays a process with new text and data segments.
- Under UNIX, parent and child processes are almost always executed in parallel.
- The routines that communicate directly with hardware are all concentrated in the UNIX kernel.
- Under UNIX, the execution of an image is called a process.
- Because of its process orientation, UNIX has no need for a dispatcher.
- Each UNIX user has access to a private text segment.
- Under UNIX, pipes are implemented in the shell.
Answers: 21.T; 22.T; 23.T; 24.F; 25.F; 26.T; 27.T; 28.F; 29.F; 30.T