COS 4: Spooling

Computer science, spooling refers to a process of communicating data to another program by placing it in a temporary working area, where the other program can access it at some later point in time. Traditional uses of the term spooling apply it to situations where there is little or no direct communication between the program that writes the data and the program that reads it. Spooling is often used when devices access data at different rates. The temporary working area provides a waiting station where data can reside while a slower device can process it at its own rate. New data is only added and deleted at the ends of the area, i.e., there is no random access or editing.
It can also refer to a storage device that incorporates a physical spool, such as a tape drive.
The most common spooling application is print spooling. In print spooling, documents are loaded into a buffer (usually an area on a disk), and then the printer pulls them off the buffer at its own rate. Because the documents are in a buffer where they can be accessed by the printer, the user is free to perform other operations on the computer while the printing takes place in the background. Spooling also lets users place a number of print jobs in a queue instead of waiting for each one to finish before specifying the next one.
The temporary storage area to which e-mail is delivered by a Mail Transfer Agent and in which it waits to be picked up by a Mail User Agent is sometimes called a mail spool. Likewise, a storage area for Usenet articles may be referred to as a news spool. (On Unix-like systems, these areas are usually located in the /var/spool directory.) Unlike other spools, mail and news spools usually allow random access to individual messages.

Origin of the term

Magnetic recording tape wound onto a spool or reel.
"Spool" is supposedly an acronym for simultaneous peripheral operations on-line (although this is thought by some to be a backronym), or as for printers: simultaneous peripheral output on line (yes, not on-line but on line). Early mainframe computers had, by current standards, small and expensive hard disks. These costs made it necessary to reserve the disks for files that required random access, while writing large sequential files to reels of tape. Typical programs would run for hours and produce hundreds or thousands of pages of printed output. Periodically the program would stop printing for a while to do a lengthy search or sort. But it was desirable to keep the printer(s) running continuously. Thus, when a program was running, it would write the printable file to a spool of tape, which would later be read back in by the program controlling the printer.
Spooling also improved the multiprogramming capability of systems. Most programs required input, and produced output, using slow peripherals such as card-readers and lineprinters. Without spooling, the number of tasks that could be multiprogrammed could be limited by the availability of peripherals; with spooling, a task did not need access to a real device: slow peripheral input and output for several tasks could be held on shared system storage, written and read by separate system processes running asynchronously with those tasks

Magnetic recording tape wound onto a spool or reel.

Spooling
Acronym for simultaneous peripheral operations on-line, spooling refers to putting jobs in a buffer, a special area in memory or on a disk where a device can access them when it is ready. Spooling is useful because devices access data at different rates. The buffer provides a waiting station where data can rest while the slower device catches up.
The most common spooling application is print spooling. In print spooling, documents are loaded into a buffer (usually an area on a disk), and then the printer pulls them off the buffer at its own rate. Because the documents are in a buffer where they can be accessed by the printer, you can perform other operations on the computer while the printing takes place in the background. Spooling also lets you place a number of print jobs on a queue instead of waiting for each one to finish before specifying the next one.

Data Spooling
Bacula allows you to specify that you want the Storage daemon to initially write your data to disk and then subsequently to tape. This serves several important purposes.
It take a long time for data to come in from the File daemon during an Incremental backup. If it is directly written to tape, the tape will start and stop or shoe-shine as it is often called causing tape wear. By first writing the data to disk, then writing it to tape, the tape can be kept in continual motion.
While the spooled data is being written to the tape, the despooling process has exclusive use of the tape. This means that you can spool multiple simultaneous jobs to disk, then have them very efficiently despooled one at a time without having the data blocks from several jobs intermingled, thus substantially improving the time needed to restore files. While despooling, all jobs spooling continue running.
Writing to a tape can be slow. By first spooling your data to disk, you can often reduce the time the File daemon is running on a system, thus reducing downtime, and/or interference with users. Of course, if your spool device is not large enough to hold all the data from your File daemon, you may actually slow down the overall backup.
Data spooling is exactly that "spooling". It is not a way to first write a "backup" to a disk file and then to a tape. When the backup has only been spooled to disk, it is not complete yet and cannot be restored until it is written to tape. In a future version, Bacula will support writing a backup to disk then later Migrating or Copying it to a tape.
The remainder of this chapter explains the various directives that you can use in the spooling process.
Data Spooling Directives
The following directives can be used to control data spooling.
To turn data spooling on/off at the Job level in the Job resource in the Director's conf file (default no).
SpoolData = yesno
To override the Job specification in a Schedule Run directive in the Director's conf file.
SpoolData = yesno
To limit the maximum total size of the spooled data for a particular device. Specified in the Device resource of the Storage daemon's conf file (default unlimited).
Maximum Spool Size = size Where size is a the maximum spool size for all jobs specified in bytes.
To limit the maximum total size of the spooled data for a particular device for a single job. Specified in the Device Resource of the Storage daemon's conf file (default unlimited).
Maximum Job Spool Size = size Where size is the maximum spool file size for a single job specified in bytes.
To specify the spool directory for a particular device. Specified in the Device Resource of the Storage daemon's conf file (default, the working directory).
Spool Directory = directory

!!! MAJOR WARNING !!!
Please be very careful to exclude the spool directory from any backup, otherwise, your job will write enormous amounts of data to the Volume, and most probably terminate in error. This is because in attempting to backup the spool file, the backup data will be written a second time to the spool file, and so on ad infinitum.
Another advice is to always specify the maximum spool size so that your disk doesn't completely fill up. In principle, data spooling will properly detect a full disk, and despool data allowing the job to continue. However, attribute spooling is not so kind to the user. If the disk on which attributes are being spooled fills, the job will be canceled. In addition, if your working directory is on the same partition as the spool directory, then Bacula jobs will fail possibly in bizarre ways when the spool fills.

Other Points
When data spooling is enabled, Bacula automatically turns on attribute spooling. In other words, it also spools the catalog entries to disk. This is done so that in case the job fails, there will be no catalog entries pointing to non-existent tape backups.
Attribute despooling is done at the end of the job, as a consequence, after Bacula stops writing the data to the tape, there may be a pause while the attributes are sent to the Directory and entered into the catalog before the job terminates.
Attribute spool files are always placed in the working directory.
When Bacula begins despooling data spooled to disk, it takes exclusive use of the tape. This has the major advantage that in running multiple simultaneous jobs at the same time, the blocks of several jobs will not be intermingled.
It probably does not make a lot of sense to enable data spooling if you are writing to disk files.
It is probably best to provide as large a spool file as possible to avoid repeatedly spooling/despooling. Also, while a job is despooling to tape, the File daemon must wait (i.e. spooling stops for the job while it is despooling).
If you are running multiple simultaneous jobs, Bacula will continue spooling other jobs while one is despooling to tape, provided there is sufficient spool file space.

COS 4

Tuesday, November 20, 2007

Spooling

0 comments:

MY PIC

Blog Archive

About Me