Escolar Documentos
Profissional Documentos
Cultura Documentos
Troubleshooting Page 1 of 12
Chapter 8. Troubleshooting
This chapter describes how to troubleshoot library and ACSLS errors. You can resolve some errors, but others may
require assistance from StorageTek. This chapter describes the following troubleshooting facilities and procedures:
This section describes error recovery that ACSLS and the library hardware provide. If an individual process or a non-
critical library component fails, ACSLS records the error in the ACSLS event log and continues to provide library
services with the unaffected parts of the system.
If a major system failure occurs, however, library operations are suspended until the error is corrected. The following
sections describe how ACSLS and the library hardware respond to communications, hardware, and software failures.
Communications Failures
Communications failures include the failure of communications lines between ACSLS and an LMU or between an LMU
and an LCU. Either hardware or software errors can cause these communications failures.
Communications software failures also include the failure of interprocess communication between ACSLS and the CSI
(client interface) or cmd_proc.
If ACSLS cannot communicate with another library component, it logs an error and retries until contact is established or
until a system-defined timeout period is reached.
Hardware Failures
Hardware reliability and redundancy can allow library operations to continue even if one component fails. For example:
A dual-LMU configuration switches to the standby LMU if the master fails; for more information, see "Managing
a Dual-LMU Configuration".
A dual-LAN client configuration switches to the backup LAN if the primary fails; for more information see
"Managing a Dual-LAN Client Configuration" .
Other hardware failures, however, can suspend library operations until the failed hardware is repaired or replaced. The
following list describe typical hardware failures and their effect on library operations:
Hint: If your LSM fails and you take it off line, you can still manually load volumes into the library drives if
the data path is still operational. For more information, see "Manually Loading Volumes Into Drives In a Disabled
LSM".
CAP failure
If a CAP fails, you cannot enter and eject cartridges directly into the affected LSM through that CAP. All other
library processes can continue normally. If the affected LSM has another CAP has multiple CAPs, you can use
another CAP. If the affected LSM is connected to another LSM via a PTP, you can use the second LSM's CAP for
enter and eject operations.
Software Failures
Major software failures include a system crash, a database failure, or a library configuration inconsistency. These errors
result in loss of library operations in all affected ACSs. After the problem is corrected, ACSLS goes through automatic
recovery procedures to restore library operations.
ACSLS and the Operating System provide the following software facilities:
Standard OS utilities
Standard Operating System utilities can be used to create core files of suspect processes and to inspect those files
for further information. Examples of such utilities are kill and gcore.
If you have access and permissions to these files, you can access information in these files via any standard
terminal (or modem) interface.
Field patching of binary modules is not supported. Even in the case of command files, the preferred method of field
correction is by issuing a new software release, requiring a full installation. For severe field errors, individual binary or
The ACSLS event log contains information about library events and errors. All ACSLS components log events to the
event log through the centralized event logger. The base event log, which is automatically created when ACSLS is
installed, is contained in the file /export/home/ACSSS/log/acsss_event.log.
library errors
Both fatal and nonfatal hardware and software errors are logged. Examples include LSM failures, problems with
cartridges, database errors, interprocess and library communications failures, and software failures not normally
handled by the operating system.
significant events
These are normal events that can help you manage the library. For example, events are logged when an audit is
initiated or terminated, a device changes state, a volume is entered or ejected, or a CAP is opened or closed.
You should browse the event log periodically to help manage ACSLS and the library. Event log entries are are
particularly useful after:
An audit
ACSLS recovery
When you log into ACSLS as the acssa user, a window with a tail of the event log is included on the standard terminal
display; for more information, see "ACSLS User IDs".
You use the acsss_config configuration program the specify the following:
Whether the event log logs cartridge enters in automatic enter mode
Whether the event log logs database volume additions and deletions
The following are errors that may occur as part of event log processing.
If a communication failure occurs while the event logger is sending a message to cmd_proc, the unsolicited
message is lost.
The following unsolicited message is displayed if the event logger is unable to access or write to the event log file.
This may be due to incorrect permissions on the directory or the file.
The /export/home/ACSSS/log directory, which contains the ACSLS event log, also contains logs for the installation
script and for each of the ACSLS utilities. Table 6 summarizes the contents of these logs.
You must manually manage the utility logs because they do not have the same file sizing and rollover options that you
can use to automatically manage the the ACSLS event log.
The acsss_config.log is a report of library hardware configured by the acsss_config program. StorageTek
recommends that you retain this log, which provides a useful record of your library hardware.
The cron_event.log logs events for the cron job that periodically runs the full_disk.sh script. The full_disk.sh.log
logs events for the full_disk.sh script, which does the following:
Logs warning messages in the ACSLS event log if the ACSLS home directory (/export/home/ACSSS) exceeds
85% full
Because the cron_event.log and the full_disk.sh.log record periodic events, you should not remove these logs. To
conserve disk space, however, you may want to periodically edit these files to remove older, less meaningful entries.
You use the rdb.acsss utility for disaster recovery of the database. "Maintaining the ACSLS Database" describes the
automatic database backup that ACSLS provides and the manual backup to tape that you should also do.
The event log records database errors, and these entries can help you determine whether you need to restore the database.
You will probably need to restore the database in any of the following situations:
The rdb.acsss utility loads the database checkpoint backup, then sequentially applies any available redo log files to
make the database as current as possible. If all redo log files are available, the database can be restored to its state just
before the failure, with essentially no loss of data. If you lost any redo log files, you must audit the library after the
restoration to make the database current.
The type of failure and whether you have a single-disk or two-disk server determine whether all redo log files are
available. A two-disk server greatly improves the possibility that you will have available all redo log files; for more
information, see See "Maintaining the ACSLS Database" . To restore the database completely you must make sure the
second disk is mounted before running the rdb.acsss utility.
Caution: After installing ACSLS, always run bdb.acsss to back up the database.
Use the following procedures to restore the database from tape. These procedures also restore miscellaneous data files,
such as access control and custom volume report files. If you are restoring from a tape backup, make sure it is the most
current backup prior to losing the database.
Hint: If there has been physical damage to the area of the disk where the database resides, you may need to rebuild
all or part of the file system and reload the database software before beginning the restoration. This may include
reformatting the lost disk or replacing the disk entirely. In this case, contact StorageTek for assistance.
The following sections describe how to restore the database and miscellaneous library resource files from the following:
The default tape device attached and configured to the ACSLS server.
A UNIX file.
Hint: After running rdb.acsss, review the log created in $ACS_HOME/log/rdb_event.log for any problems the
utility may have encountered. Depending on how thorough the restoration was, you may need to do either or both of the
following to bring the database current:
If the hardware has changed since the last backup (such as adding a new LSM or transport), you will need to rerun
acsss_config.
If the library contents have changed significantly since the creation of the backup tape (for example, enters, ejects,
and scratch mounts), and there were no redo log files available, you will need to audit the library.
Restoring From The Default Tape Device Attached to the ACSLS Server
To restore the ACSLS database from the default tape device attached to the ACSLS server after a database
failure, do the following:
1. Login as acsss.
kill.acsss
db_command stop
4. Ensure any existing redo log files are available to the database system.
If the hardware configuration includes only a primary disk and the primary disk fails, there may not be any redo
log files available. In this case, the database may only be restored to the point at which bdb.acsss was last
executed.
If the hardware configuration includes a second disk for database support and the primary disk fails, the redo log
files are available for restoration. The user, however, must verify the second disk is installed and its partition is
mounted. This can be verified by executing the df command from any window, then making sure the second disk
primary and backup partitions are mounted.
If the hardware configuration includes a second disk for database support and the second disk fails, you must
deinstall the second disk using sd_mgr.sh. No data is lost, but running bdb.acsss more frequently is advised.
When the second disk is replaced, rerun sd_mgr.sh to reinstall second disk support.
If the hardware configuration included a second disk for database support and both the primary and second disks
fail simultaneously, the redo log files may not be available for restoration.
5. Make sure the most current backup tape is write-protected, then insert it into the tape device.
rdb.acsss
Depending on the nature of the database problem, the utility may prompt you for additional information.
8. Remove the tape from the drive and insert a new backup tape.
10. Wait for the backup to complete. Remove the tape from the drive, write protect it, label it with the current
date, and store it in a safe place.
rc.acsss
To restore the ACSLS database from a specified tape device attached to the ACSLS server after a database
failure, do the following:
1. Login as acsss.
kill.acsss
db_command stop
4. Ensure any existing redo log files are available to the database system.
If the hardware configuration includes only a primary disk and the primary disk fails, there may not be any redo
log files available. In this case, the database may only be restored to the point at which bdb.acsss was last
executed.
If the hardware configuration includes a second disk for database support and the primary disk fails, the redo log
files are available for restoration. The user, however, must verify the second disk is installed and its partition is
mounted. This can be verified by executing the df command from any window, then making sure the second disk
primary and backup partitions are mounted.
If the hardware configuration includes a second disk for database support and the second disk fails, you must
deinstall the second disk using sd_mgr.sh. No data is lost, but running bdb.acsss more frequently is advised.
When the second disk is replaced, rerun sd_mgr.sh to reinstall second disk support.
If the hardware configuration included a second disk for database support and both the primary and second disks
fail simultaneously, the redo log files may not be available for restoration.
5. Make sure the most current backup tape is write-protected, then insert it into the tape device.
rdb.acsss -f tape_device
Depending on the nature of the database problem, the utility may prompt you for additional information.
8. Remove the tape from the drive and insert a new backup tape.
10. Wait for the backup to complete. Remove the tape from the drive, write protect it, label it with the current
date, and store it in a safe place.
rc.acsss
Example-To restore the ACSLS database from tape device /dev/rmt/2, enter the following command:
rdb.acsss -f /dev/rmt/2
Caution: rdb.acsss does not check the contents of the file to verify that it contains the backed up database and
miscellaneous library resource file! Make sure you specify the correct file!
1. Login as acsss.
kill.acsss
db_command stop
4. Ensure any existing redo log files are available to the database system.
If the hardware configuration includes only a primary disk and the primary disk fails, there may not be any redo
log files available. In this case, the database may only be restored to the point at which bdb.acsss was last
executed.
If the hardware configuration includes a second disk for database support and the primary disk fails, the redo log
files are available for restoration. The user, however, must verify the second disk is installed and its partition is
mounted. This can be verified by executing the df command from any window, then making sure the second disk
primary and backup partitions are mounted.
If the hardware configuration includes a second disk for database support and the second disk fails, you must
deinstall the second disk using sd_mgr.sh. No data is lost, but running bdb.acsss more frequently is advised.
When the second disk is replaced, rerun sd_mgr.sh to reinstall second disk support.
If the hardware configuration included a second disk for database support and both the primary and second disks
fail simultaneously, the redo log files may not be available for restoration.
rdb.acsss -f db_file
Where:
db_file specifies a UNIX file that contains the ACSLS database backup.
9. Wait for the backup to complete. Remove the tape from the drive, write protect it, label it with the current
date, and store it in a safe place.
rc.acsss
The ACSLS database structure may be corrupted if you are experiencing all of the following problems:
You cannot do basic ACSLS functions, such as mounts, dismounts, enters, and ejects
Restoring the database using the most current backup does not fix the problem
If the ACSLS database is corrupted, you must delete the database, reinstall the database and reconfigure ACSLS, and run
an audit.
idle
kill.acsss
db_command stop
db_command abort
cd /export/home
rm -r oracle
8. Insert the ACSLS distribution media and change to the installation directory.
For more information, see the ACSLS Installation and Configuration Guide for your platform and version.
./install.sh
11. When the "installation complete" message appears, remove the distribution media.
12. Respond to the prompts for database backup directory, automatic startup, and adding a modem.
For more information, see the ACSLS Installation and Configuration Guide for your platform and version.
13. At the next prompt, press <CTRL> + <c> to exit the installation script.
For more information, see the ACSLS Installation and Configuration Guide for your platform and version.
rc.acsss
If you have a second server disk installed, use this procedure to recover from a primary disk failure.
For more information, see the ACSLS Installation and Configuration Guide for your platform and version.
mkdir /second_disk
5. Add entries for the second disk and second disk backup mount points to the /etc/vfstab file.
mount -a
8. Configure ACSLS.
For more information, see the ACSLS Installation and Configuration Guide for your platform and version.
For more information, see "How to Restore the Database" . The redo log files residing on second disk will be
applied to the backup version to bring the database up to date.