COS 4

Friday, November 23, 2007

UNIX

Unix (officially trademarked as UNIX®, sometimes also written as Unix or Unix® with small caps) is a computer operating system originally developed in 1969 by a group of AT&T employees at Bell Labs including Ken Thompson, Dennis Ritchie and Douglas McIlroy. Today's Unix systems are split into various branches, developed over time by AT&T as well as various commercial vendors and non-profit organizations.
As of 2007, the owner of the trademark UNIX® is The Open Group, an industry standards consortium. Only systems fully compliant with and certified to the Single UNIX Specification qualify as "UNIX®" (others are called "Unix system-like" or "Unix-like").
During the late 1970s and early 1980s, Unix's influence in academic circles led to large-scale adoption of Unix (particularly of the BSD variant, originating from the University of California, Berkeley) by commercial startups, the most notable of which is Sun Microsystems. Today, in addition to certified Unix systems, Unix-like operating systems such as Linux and BSD are commonly encountered. Sometimes, "traditional Unix" may be used to describe a Unix or an operating system that has the characteristics of either Version 7 Unix or UNIX System V.

What is UNIX ®?

The Open Group holds the definition of what a UNIX system is and its associated trademark in trust for the industry.
In 1994 Novell (who had acquired the UNIX systems business of AT&T/USL) decided to get out of that business. Rather than sell the business as a single entity, Novell transferred the rights to the UNIX trademark and the specification (that subsequently became the Single UNIX Specification) to The Open Group (at the time X/Open Company). Subsequently, it sold the source code and the product implementation (UNIXWARE) to SCO. The Open Group also owns the trademark UNIXWARE, transferred to them from SCO more recently.
Today, the definition of UNIX ® takes the form of the worldwide Single UNIX Specification integrating X/Open Company's XPG4, IEEE's POSIX Standards and ISO C. Through continual evolution, the Single UNIX Specification is the defacto and dejure standard definition for the UNIX system application programming interfaces. As the owner of the UNIX trademark, The Open Group has separated the UNIX trademark from any actual code stream itself, thus allowing multiple implementations. Since the introduction of the Single UNIX Specification, there has been a single, open, consensus specification that defines the requirements for a conformant UNIX system.
There is also a mark, or brand, that is used to identify those products that have been certified as conforming to the Single UNIX Specification, initially UNIX 93, followed subsequently by UNIX 95, UNIX 98 and now UNIX 03.
The Open Group is committed to working with the community to further the development of standards conformant systems by evolving and maintaining the Single UNIX Specification and participation in other related standards efforts. Recent examples of this are making the standard freely available on the web, permitting reuse of the standard in open source documentation projects , providing test tools ,developing the POSIX and LSB certification programs.
From this page you can read about the history of the UNIX system over the past 30 years or more. You can learn about the Single UNIX Specification, and read or download online versions of the specification. You can also get involved in the ongoing development and maintenance of the Single UNIX Specification, by joining the Austin Group whose approach to specification development is "write once, adopt everywhere", The Open Group's Base Working Group or get involved in the UNIX Certification program.

The Creation of the UNIX* Operating System

After three decades of use, the UNIX* computer operating system from Bell Labs is still regarded as one of the most powerful, versatile, and flexible operating systems (OS) in the computer world. Its popularity is due to many factors, including its ability to run a wide variety of machines, from micros to supercomputers, and its portability -- all of which led to its adoption by many manufacturers.
Like another legendary creature whose name also ends in 'x,' UNIX rose from the ashes of a multi-organizational effort in the early 1960s to develop a dependable timesharing operating system.
The joint effort was not successful, but a few survivors from Bell Labs tried again, and what followed was a system that offers its users a work environment that has been described as "of unusual simplicity, power, and elegance...."
The system also fostered a distinctive approach to software design -- solving a problem by interconnecting simpler tools, rather than creating large monolithic application programs.
Its development and evolution led to a new philosophy of computing, and it has been a never-ending source of both challenges and joy to programmers around the world.

Thursday, November 22, 2007

What is loosely coupled hardware? Tighly coupled?

"Loosely" and "tightly" coupled referring to other things than hardware... like connections. Maybe the same concepts can apply to hardware in some situations, and I just haven't run into it.

Tightly coupled usually means synchronous connections where if there is any interruption both ends of the communication have to wait until the connection is restored.

Loosely coupled usually means asynchronous connections where if there is any interruption, one or both ends of the communication are able to function separately until the connection is restored.

There can be reasons why one or the other situation is supported, "tightly coupled" communications ensure the integrity and state on both ends while "loosely coupled" communications can mean better fault tolerance in that things get done as much as possible even when conditions aren't perfect.

Coupling refers to how much a component (software or hardware) relies on another component.Coupling can be loose or tight.With low coupling, a change in one module will not require a change in the implementation of another module. Low coupling is often a sign of a well-structured computer system.

What do you think is the advantages of distributed systems over parallel systems? Parallel systems over distributed systems?

A distributed system's main advantage over a parallel system approach is the ability to re-distribute its different functions on to different platforms. That gives you the ability to both scale up (substitute bigger boxes for existing) and to scale out (add additional boxes that add same functionality).

A parallel system approach instead tries to concentrate on internal speed, which is nearly instantaneous compared to the latencies involved for a distributed system.

Tuesday, November 20, 2007

Main Differences of Operating System from the Main Frame Computers

*An operating system is a software that actually runs the computer and is the interface between the user and the computer. There are different operating systems for mainframes and personal computers. The mainframe operating systems are more powerful and expensive whereas the OS for PC's are readily available and are not that powerful.

*The for a main frame is targeted to handle hundreds of users at a time. Means that its managing hundreds of displays (monitors) and keyboards and keeps track of the input from each user, the process required/requested on these inputs and the output of these inputs. On the other hand, the other operating systems are not truly multiuser. If you create four accounts on Windows XP, then it does not mean that the OS is multiuser because only a single user can log in and interact with the machine.

*PC OS's are not concerned with fair use or maximal use of computer resources. Instead, they try to optimize the usefulness of the computer for an individual user, usually at the expense of overall efficiency. Mainframe OS's need more complex scheduling and I/O algortihms to keep the various system components efficiently occupied.

*Generally, operating systems for batch systems have simpler requirements than for personal computers. Batch systems do not have to be concerned with interacting with a user as much as a personal computer. As a result, an operating system for a PC must be concerned with response time for an interactive user. Batch systems do not have such requirements. A pure batch system also may have not to handle time sharing, whereas an operating system must switch rapidly between different jobs.

*Personal computer operating systems are not concerned with fair use, or maximal use, of computer facilities. Instead, they try to optimize the usefulness of the computer for an individual user, usually at the expense of efficiency. Consider how many CPU cycles are used by graphical user interfaces (GUIs). Mainframe operating systems need more complex scheduling and I/O algorithms to keep the various system components busy.

*Personal computers Os are very fixed, the computer needs a keyboard, a mouse and one processor. and at least one monitor and a primary hard drive. while mainframes require very flexible structure. a mani frame can have several hard drives and several processors that cooperate to do the job.

*Also mainframe software has a flexible process control. so that one crashing program wouldn't usually crash the main frame while in PC there is place for such complexities and one process can crash the entire system.

*Generally the difference looks like the difference between a boxer who fights alone, and a big army that has many division to cooperate simultaneously to do a huge task on a large scale.

Multiprogramming

Early computer systems were extremely expensive to purchase and to operate. Unfortunately they often sat idle as their human operators tended to their duties at a humans pace. Even a group of highly trained computer operators could not work fast enough to keep even the earliest tape-fed batch systems CPU busy. The advent of multiprogramming broke through the utilization barrier by removing most of the human factor in CPU utilization.
In a multiprogramming-capable system, jobs to be executed are loaded into a pool. Some number of those jobs are loaded into main memory, and one is selected from the pool for execution by the CPU. If at some point the program in progress terminates or requires the services of a peripheral device, the control of the CPU is given to the next job in the pool. As programs terminate, more jobs are loaded into memory for execution, and CPU control is switched to another job in memory. In this way the CPU is always executing some program or some portion thereof, instead of waiting for a printer, tape drive, or console input [1].
An important concept in multiprogramming is the degree of multiprogramming. The degree of multiprogramming describes the maximum number of processes that a single-processor system can accommodate efficiently[2]. The primary factor affecting the degree of multiprogramming is the amount of memory available to be allocated to executing processes. If the amount of memory is too limited, the degree of multiprogramming will be limited because fewer processes will fit in memory. A factor inherent in the operating system itself is the means by which resources are allocated to processes. If the operating system can not allocate resources to executing processes in a fair and orderly fashion, the system will waste time in reallocation, or process execution could enter into a deadlock state as programs wait for allocated resources to be freed by other blocked processes. Other factors affecting the degree of multiprogramming are program I/O needs, program CPU needs, and memory and disk access speed[3].
Processes maintained in the computers main memory are considered to be executing concurrently even though the CPU is (usually) capable of executing only one instruction at a time. The number of processes that can be held in memory is dependent on the amount of memory available to the system. That amount may be any combination of real or virtual memory, virtual memory being a portion of on-line mass-storage allocated to hold code in the process of being executed which can not fit into available real memory. If a process is too large to fit into the memory allocated to it, portions of its code may be stored temporarily on disk. When this code is required the operating system will load the code into memory and execution will continue. The management process by which this code is swapped into and out of memory is referred to a paging. A similar system can be used to manage data-segment memory[3] [1].
As the number of processes (degree of multiprogramming) increases in a system that supports paging, the amount of memory available to execute processes in decreases and the number of paging operations required increases. At some point the amount of time the CPU spends paging code and data will drag system performance down. This phenomenon, called "thrashing," is a manifestation of exceeding the degree of multiprogramming for a system [2][3][1].
In contrast to a multiprogramming system, a batch system executes its jobs sequentially, and might be referred to as a "monoprogramming" system (though certain batch systems have monitor programs, which might classify them loosely as multiprogramming systems). Batch systems were developed prior to multiprogramming as a means of increasing efficiency of computer time use. In a batch system, jobs with similar requirements (ie. compiler) are collected together and loaded into the system. Each job is taken sequentially, and the next jobs execution starts after the previous ones stops.
The batch system is not as efficient as the multiprogrammed system due to the fact that the CPU must sit idle while waiting on I/O devices or human input. Unlike a multiprogrammed system, the program in progress always has the ability to fully utilize the CPU and its resources. Enhancements to early batch systems included copying all punch-cards to tape and placing all output onto tape for later printing, both operations having been done on a separate system dedicated to those purposes [1]
Efficient use of computer time was the impetus for the development of multiprogramming systems. Multiprogramming freed the CPU from relying on the inherent slowness of the operator, and permitted it to work while waiting on peripheral devices. Every computer system has a limit to the degree of multiprogramming it will support. This limit is primarily based on the amount of main memory available to the system and the efficiency of the operating systems resource allocation algorithms.

Timesharing
Timesharing, also called "time slicing" or "time division multiplexing", or even "multitasking", is a system whereby system focus is switched between each of its users. Ideally, users of a timesharing system will not be aware that they are sharing the computer with one or more users.
Timesharing came about as an extension of multiprogramming. Old batch systems, including multiprogramming batch systems, were usually isolated from their end-users. Due to this isolation, programmers were required to plan for every contingency in designing the input for their code. Many times a programmer would receive an output printout or a memory dump of their program. Interaction with the computer system was not possible due to the inefficiency introduced by extensive human interaction [1].
Timesharing provided a means by which system users could interact with the computer and the computer could direct its idle time toward other users or other processes. Now programmers could write programs that interacted with the user; they even had the opportunity to debug their code in a live environment rather than a stack of paper printout [1].
To provide service to individual users and maintain their operating environment, timesharing systems employ some means of context switching. In context switching, a users execution environment is loaded into memory, some operation is performed, and then the environment is placed back into temporary storage until the CPU comes back to service the next request [2].
Timeshare systems allowed multiple users access to the same system, so protection must be provided for each users environment and for the operating system itself. Also, timeshare systems had to provide the illusion of immediate system interaction in order to be practical and attractive to users demanding real-time interaction.
To achieve effective user interaction, processes were executed in a defined order, and were only given a certain amount of CPU time. This way everyone gets the CPUs attention at some point, and no one can take complete control. Two common schemes are FIFO and round robin. The FIFO scheme processes requests in the order in which they are received. Unfortunately FIFO is not conducive to interactive processing because it permits jobs to run to completion. Thus short jobs wait for long jobs, and job priority can not be defined [3]
On the other hand, round-robin scheduling is a preemptive method of allocating CPU time. In this scheme, processes gain CPU control in a FIFO method, but after a certain amount of time are stopped, or preempted, and placed back into the queue. It is in this scheme that context switching becomes a central issue. Efficiency in setup and clean up for processes determines the overhead introduced by the system. TSS, the time sharing system for the IBM System 360/67, used a scheduling scheme similar to a combination FIFO/round-robin. [2]
Compared to a timeshare system, the scheduling policy of a batch system is a hybrid of round-robin and FIFO, but with more emphasis on FIFO. Earlier systems relied entirely on a FIFO scheme; one job was loaded into the system and ran to completion before another could start. Later batch systems improved on this theme to provide one job with the CPU for as long as it could execute without requesting an I/O device. Upon such a request, the CPU switched execution to the next job in the queue [1].
To make them practical, timesharing systems usually supported some sort of on-line central file system and a means of organizing it. MUTLICS, Multiplexed Information and Computing Service, was the first timesharing system to support a centralized file system organized hierarchically [4].
Timesharing systems, the evolution of multiprogrammed systems, brought the computer to the user. Utilizing process scheduling and queuing schemes, individual users were provided the illusion of being sole user of the system, unlike batch systems where jobs had the opportunity to capture the attention of the CPU from other users. These systems typically incorporated a shared file system where user code and data could be maintained and developed. The timesharing system leveraged the computing investment and at the same time increased the level of service to the end user.

Multiprogramming

Multiprogramming is a rudimentary form of parallel processing in which several programs are run at the same time on a uniprocessor. Since there is only one processor, there can be no true simultaneous execution of different programs. Instead, the operating system executes part of one program, then part of another, and so on. To the user it appears that all programs are executing at the same time.
If the machine has the capability of causing an interrupt after a specified time interval, then the operating system will execute each program for a given length of time, regain control, and then execute another program for a given length of time, and so on. In the absence of this mechanism, the operating system has no choice but to begin to execute a program with the expectation, but not the certainty, that the program will eventually return control to the operating system.
If the machine has the capability of protecting memory, then a bug in one program is less likely to interfere with the execution of other programs. In a system without memory protection, one program can change the contents of storage assigned to other programs or even the storage assigned to the operating system. The resulting system crashes are not only disruptive, they may be very difficult to debug since it may not be obvious which of several programs is at fault.

Multiprogramming
Early computers ran one process at a time. While the process waited for servicing by another device, the CPU was idle. In an I/O intensive process, the CPU could be idle as much as 80% of the time. Advancements in operating systems led to computers that load several independent processes into memory and switch the CPU from one job to another when the first becomes blocked while waiting for servicing by another device. This idea of multiprogramming reduces the idle time of the CPU. Multiprogramming accelerates the throughput of the system by efficiently using the CPU time.
Programs in a multiprogrammed environment appear to run at the same time. Processes running in a multiprogrammed environment are called concurrent processes. In actuality, the CPU processes one instruction at a time, but can execute instructions from any active process.

As the illustration shows, CPU utilization of a system can be improved by using multiprogramming. Let P be the fraction of time that a process spends away from the CPU. If there is one process in memory, the CPU utilization is (1-P). If there are N processes in memory, the probability of N processes waiting for an I/O is P*P...*P (N times). The CPU utilization is ( 1 - P^N ) where N is called the multiprogramming level (MPL) or the degree of multiprogramming. As N increases, the CPU utilization increases. While this equation indicates that a CPU continues to work more efficiently as more and more processes are added, logically, this cannot be true. Once the system passes the point of optimal CPU utilization, it thrashes.
(For more on performance issues of OS, refer to Operating System Performance Module )
In order to use the multiprogramming concept, processes must be loaded into independent sections or partitions of memory. So, main memory is divided into fixed-sized or variable-sized partitions. Since a partition may not be large enough for the entire process, virtual memory is implemented to keep the processes executing. The answers to several questions are important to implementing an efficient virtual memory system in a multiprogrammed environment.

Spooling

Computer science, spooling refers to a process of communicating data to another program by placing it in a temporary working area, where the other program can access it at some later point in time. Traditional uses of the term spooling apply it to situations where there is little or no direct communication between the program that writes the data and the program that reads it. Spooling is often used when devices access data at different rates. The temporary working area provides a waiting station where data can reside while a slower device can process it at its own rate. New data is only added and deleted at the ends of the area, i.e., there is no random access or editing.
It can also refer to a storage device that incorporates a physical spool, such as a tape drive.
The most common spooling application is print spooling. In print spooling, documents are loaded into a buffer (usually an area on a disk), and then the printer pulls them off the buffer at its own rate. Because the documents are in a buffer where they can be accessed by the printer, the user is free to perform other operations on the computer while the printing takes place in the background. Spooling also lets users place a number of print jobs in a queue instead of waiting for each one to finish before specifying the next one.
The temporary storage area to which e-mail is delivered by a Mail Transfer Agent and in which it waits to be picked up by a Mail User Agent is sometimes called a mail spool. Likewise, a storage area for Usenet articles may be referred to as a news spool. (On Unix-like systems, these areas are usually located in the /var/spool directory.) Unlike other spools, mail and news spools usually allow random access to individual messages.

Origin of the term

Magnetic recording tape wound onto a spool or reel.
"Spool" is supposedly an acronym for simultaneous peripheral operations on-line (although this is thought by some to be a backronym), or as for printers: simultaneous peripheral output on line (yes, not on-line but on line). Early mainframe computers had, by current standards, small and expensive hard disks. These costs made it necessary to reserve the disks for files that required random access, while writing large sequential files to reels of tape. Typical programs would run for hours and produce hundreds or thousands of pages of printed output. Periodically the program would stop printing for a while to do a lengthy search or sort. But it was desirable to keep the printer(s) running continuously. Thus, when a program was running, it would write the printable file to a spool of tape, which would later be read back in by the program controlling the printer.
Spooling also improved the multiprogramming capability of systems. Most programs required input, and produced output, using slow peripherals such as card-readers and lineprinters. Without spooling, the number of tasks that could be multiprogrammed could be limited by the availability of peripherals; with spooling, a task did not need access to a real device: slow peripheral input and output for several tasks could be held on shared system storage, written and read by separate system processes running asynchronously with those tasks

Magnetic recording tape wound onto a spool or reel.

Spooling
Acronym for simultaneous peripheral operations on-line, spooling refers to putting jobs in a buffer, a special area in memory or on a disk where a device can access them when it is ready. Spooling is useful because devices access data at different rates. The buffer provides a waiting station where data can rest while the slower device catches up.
The most common spooling application is print spooling. In print spooling, documents are loaded into a buffer (usually an area on a disk), and then the printer pulls them off the buffer at its own rate. Because the documents are in a buffer where they can be accessed by the printer, you can perform other operations on the computer while the printing takes place in the background. Spooling also lets you place a number of print jobs on a queue instead of waiting for each one to finish before specifying the next one.

Data Spooling
Bacula allows you to specify that you want the Storage daemon to initially write your data to disk and then subsequently to tape. This serves several important purposes.
It take a long time for data to come in from the File daemon during an Incremental backup. If it is directly written to tape, the tape will start and stop or shoe-shine as it is often called causing tape wear. By first writing the data to disk, then writing it to tape, the tape can be kept in continual motion.
While the spooled data is being written to the tape, the despooling process has exclusive use of the tape. This means that you can spool multiple simultaneous jobs to disk, then have them very efficiently despooled one at a time without having the data blocks from several jobs intermingled, thus substantially improving the time needed to restore files. While despooling, all jobs spooling continue running.
Writing to a tape can be slow. By first spooling your data to disk, you can often reduce the time the File daemon is running on a system, thus reducing downtime, and/or interference with users. Of course, if your spool device is not large enough to hold all the data from your File daemon, you may actually slow down the overall backup.
Data spooling is exactly that "spooling". It is not a way to first write a "backup" to a disk file and then to a tape. When the backup has only been spooled to disk, it is not complete yet and cannot be restored until it is written to tape. In a future version, Bacula will support writing a backup to disk then later Migrating or Copying it to a tape.
The remainder of this chapter explains the various directives that you can use in the spooling process.
Data Spooling Directives
The following directives can be used to control data spooling.
To turn data spooling on/off at the Job level in the Job resource in the Director's conf file (default no).
SpoolData = yesno
To override the Job specification in a Schedule Run directive in the Director's conf file.
SpoolData = yesno
To limit the maximum total size of the spooled data for a particular device. Specified in the Device resource of the Storage daemon's conf file (default unlimited).
Maximum Spool Size = size Where size is a the maximum spool size for all jobs specified in bytes.
To limit the maximum total size of the spooled data for a particular device for a single job. Specified in the Device Resource of the Storage daemon's conf file (default unlimited).
Maximum Job Spool Size = size Where size is the maximum spool file size for a single job specified in bytes.
To specify the spool directory for a particular device. Specified in the Device Resource of the Storage daemon's conf file (default, the working directory).
Spool Directory = directory

!!! MAJOR WARNING !!!
Please be very careful to exclude the spool directory from any backup, otherwise, your job will write enormous amounts of data to the Volume, and most probably terminate in error. This is because in attempting to backup the spool file, the backup data will be written a second time to the spool file, and so on ad infinitum.
Another advice is to always specify the maximum spool size so that your disk doesn't completely fill up. In principle, data spooling will properly detect a full disk, and despool data allowing the job to continue. However, attribute spooling is not so kind to the user. If the disk on which attributes are being spooled fills, the job will be canceled. In addition, if your working directory is on the same partition as the spool directory, then Bacula jobs will fail possibly in bizarre ways when the spool fills.

Other Points
When data spooling is enabled, Bacula automatically turns on attribute spooling. In other words, it also spools the catalog entries to disk. This is done so that in case the job fails, there will be no catalog entries pointing to non-existent tape backups.
Attribute despooling is done at the end of the job, as a consequence, after Bacula stops writing the data to the tape, there may be a pause while the attributes are sent to the Directory and entered into the catalog before the job terminates.
Attribute spool files are always placed in the working directory.
When Bacula begins despooling data spooled to disk, it takes exclusive use of the tape. This has the major advantage that in running multiple simultaneous jobs at the same time, the blocks of several jobs will not be intermingled.
It probably does not make a lot of sense to enable data spooling if you are writing to disk files.
It is probably best to provide as large a spool file as possible to avoid repeatedly spooling/despooling. Also, while a job is despooling to tape, the File daemon must wait (i.e. spooling stops for the job while it is despooling).
If you are running multiple simultaneous jobs, Bacula will continue spooling other jobs while one is despooling to tape, provided there is sufficient spool file space.

Batch System

Batch processing is execution of a series of programs ("jobs") on a computer without human interaction.
Batch jobs are set up so they can be run to completion without human interaction, so all input data is preselected through scripts or commandline parameters. This is in contrast to interactive programs which prompt the user for such input.
Batch processing has these benefits:
It allows sharing of computer resources among many users,
It shifts the time of job processing to when the computing resources are less busy,
It avoids idling the computing resources with minute-by-minute human interaction and supervision,
By keeping high overall rate of utilization, it better amortizes the cost of a computer, especially an expensive one.
Batch processing has been associated with mainframe computers since the earliest days of electronic computing in 1950s. Because such computers were enormously costly, batch processing was the only economically-viable option of their use. In those days, interactive sessions with either text-based computer terminal interfaces or graphical user interfaces were not widespread. Initially, computers were not even capable of having multiple programs loaded to the main memory.
Batch processing has grown beyond its mainframe origins, and is now frequently used in UNIX environments, where the cron and at facilities allow for scheduling of complex job scripts. Similarly, Microsoft DOS and Windows systems refer to their command-scripting language as batch files and Windows has a job scheduler.
A popular computerized batch processing procedure is printing. This normally involves the operator selecting the documents they need printed and indicating to the batch printing software when and where they should be output. Batch processing is also used for efficient bulk database updates and automated transaction processing, as contrasted to interactive online transaction processing (OLTP) applications.