Você está na página 1de 12

Chapter 8.

Troubleshooting Page 1 of 12

Chapter 8. Troubleshooting
This chapter describes how to troubleshoot library and ACSLS errors. You can resolve some errors, but others may
require assistance from StorageTek. This chapter describes the following troubleshooting facilities and procedures:

 ACSLS and library hardware error recovery

 ACSLS event log

 ACSLS Installation Script and Utility Logs

 Restoring the ACSLS database

 Recovering from primary disk failure

ACSLS and Library Hardware Error Recovery

This section describes error recovery that ACSLS and the library hardware provide. If an individual process or a non-
critical library component fails, ACSLS records the error in the ACSLS event log and continues to provide library
services with the unaffected parts of the system.

If a major system failure occurs, however, library operations are suspended until the error is corrected. The following
sections describe how ACSLS and the library hardware respond to communications, hardware, and software failures.

Communications Failures

Communications failures include the failure of communications lines between ACSLS and an LMU or between an LMU
and an LCU. Either hardware or software errors can cause these communications failures.

Communications software failures also include the failure of interprocess communication between ACSLS and the CSI
(client interface) or cmd_proc.

If ACSLS cannot communicate with another library component, it logs an error and retries until contact is established or
until a system-defined timeout period is reached.

Hardware Failures

Hardware reliability and redundancy can allow library operations to continue even if one component fails. For example:

 A dual-LMU configuration switches to the standby LMU if the master fails; for more information, see "Managing
a Dual-LMU Configuration".

 A dual-LAN client configuration switches to the backup LAN if the primary fails; for more information see
"Managing a Dual-LAN Client Configuration" .

file://H:\study\netbackup\Upload_site_done\done\New Folder\Chapter 8_ Troubleshooting_acsls.htm 7/6/2010


Chapter 8. Troubleshooting Page 2 of 12

Other hardware failures, however, can suspend library operations until the failed hardware is repaired or replaced. The
following list describe typical hardware failures and their effect on library operations:

LSM robot failure


Complete loss of robot function makes the affected LSM unavailable.

Loss of robot hands, cameras, or lights


Library processing in an LSM can continue in a degraded mode if the robot loses only one hand, camera, or light.
If both hands, cameras, or lights fail, however, the LSM becomes unavailable.

Hint: If your LSM fails and you take it off line, you can still manually load volumes into the library drives if
the data path is still operational. For more information, see "Manually Loading Volumes Into Drives In a Disabled
LSM".

CAP failure
If a CAP fails, you cannot enter and eject cartridges directly into the affected LSM through that CAP. All other
library processes can continue normally. If the affected LSM has another CAP has multiple CAPs, you can use
another CAP. If the affected LSM is connected to another LSM via a PTP, you can use the second LSM's CAP for
enter and eject operations.

Software Failures

Major software failures include a system crash, a database failure, or a library configuration inconsistency. These errors
result in loss of library operations in all affected ACSs. After the problem is corrected, ACSLS goes through automatic
recovery procedures to restore library operations.

Tracking Software Problems

ACSLS and the Operating System provide the following software facilities:

ACSLS event log


This log contains a time-stamped history of significant events to help troubleshoot software problems. See "Event
Log" for a detailed description of this log and its uses.

Standard OS utilities
Standard Operating System utilities can be used to create core files of suspect processes and to inspect those files
for further information. Examples of such utilities are kill and gcore.

If you have access and permissions to these files, you can access information in these files via any standard
terminal (or modem) interface.

Distributing Software Corrections

Field patching of binary modules is not supported. Even in the case of command files, the preferred method of field
correction is by issuing a new software release, requiring a full installation. For severe field errors, individual binary or

file://H:\study\netbackup\Upload_site_done\done\New Folder\Chapter 8_ Troubleshooting_acsls.htm 7/6/2010


Chapter 8. Troubleshooting Page 3 of 12

command files can be issued and replaced.

ACSLS Event Log

The ACSLS event log contains information about library events and errors. All ACSLS components log events to the
event log through the centralized event logger. The base event log, which is automatically created when ACSLS is
installed, is contained in the file /export/home/ACSSS/log/acsss_event.log.

Logged events include the following:

library errors
Both fatal and nonfatal hardware and software errors are logged. Examples include LSM failures, problems with
cartridges, database errors, interprocess and library communications failures, and software failures not normally
handled by the operating system.

significant events
These are normal events that can help you manage the library. For example, events are logged when an audit is
initiated or terminated, a device changes state, a volume is entered or ejected, or a CAP is opened or closed.

Using the Event Log

You should browse the event log periodically to help manage ACSLS and the library. Event log entries are are
particularly useful after:

 An audit

 A hardware or software failure

 ACSLS recovery

See ACSLS Messages for descriptions of significant event log messages.

Managing the Event Log

When you log into ACSLS as the acssa user, a window with a tail of the event log is included on the standard terminal
display; for more information, see "ACSLS User IDs".

You use the acsss_config configuration program the specify the following:

 Event log size and number of rollover files

 Pathname of the directory that contains the event log

file://H:\study\netbackup\Upload_site_done\done\New Folder\Chapter 8_ Troubleshooting_acsls.htm 7/6/2010


Chapter 8. Troubleshooting Page 4 of 12

 Event log date/time format

 Whether the event log logs cartridge enters in automatic enter mode

 Whether the event log logs database volume additions and deletions

For more information, see "Planning ACSLS Configuration Options".

Event Log Errors

The following are errors that may occur as part of event log processing.

 If a communication failure occurs while the event logger is sending a message to cmd_proc, the unsolicited
message is lost.

 The following unsolicited message is displayed if the event logger is unable to access or write to the event log file.
This may be due to incorrect permissions on the directory or the file.

Event log access failed

ACSLS Installation Script and Utility Logs

The /export/home/ACSSS/log directory, which contains the ACSLS event log, also contains logs for the installation
script and for each of the ACSLS utilities. Table 6 summarizes the contents of these logs.

You must manually manage the utility logs because they do not have the same file sizing and rollover options that you
can use to automatically manage the the ACSLS event log.

The acsss_config.log is a report of library hardware configured by the acsss_config program. StorageTek
recommends that you retain this log, which provides a useful record of your library hardware.

The bdb_event.log, export_event.log, import_event.log, install.log, rdb_event.log, sd_event.log, and


volrpt.log log success and failure entries each time you run the corresponding script or utility. These logs are most
useful, therefore, right after you run the utility. Especially if the utility failed, review its log, correct any errors, and rerun
the utility until it runs successfully. After the utility runs successfully, you may want to delete its log to free disk space
and to ensure that when you next run the utility, the log is recreated and contains only meaningful entries. If you cannot
successfully run the utility, however, save the log, which can help StorageTek resolve the problem.

The cron_event.log logs events for the cron job that periodically runs the full_disk.sh script. The full_disk.sh.log
logs events for the full_disk.sh script, which does the following:

 Logs warning messages in the ACSLS event log if the ACSLS home directory (/export/home/ACSSS) exceeds
85% full

 Backs up the ACSLS database as described in "Automatic Database Backup"

file://H:\study\netbackup\Upload_site_done\done\New Folder\Chapter 8_ Troubleshooting_acsls.htm 7/6/2010


Chapter 8. Troubleshooting Page 5 of 12

Because the cron_event.log and the full_disk.sh.log record periodic events, you should not remove these logs. To
conserve disk space, however, you may want to periodically edit these files to remove older, less meaningful entries.

Restoring the Database

You use the rdb.acsss utility for disaster recovery of the database. "Maintaining the ACSLS Database" describes the
automatic database backup that ACSLS provides and the manual backup to tape that you should also do.

The event log records database errors, and these entries can help you determine whether you need to restore the database.
You will probably need to restore the database in any of the following situations:

 After a system crash.

 Anytime the database cannot be started.

 Anytime there is a physical or logical error in the database.

Extent of the Restoration

The rdb.acsss utility loads the database checkpoint backup, then sequentially applies any available redo log files to
make the database as current as possible. If all redo log files are available, the database can be restored to its state just
before the failure, with essentially no loss of data. If you lost any redo log files, you must audit the library after the
restoration to make the database current.

The type of failure and whether you have a single-disk or two-disk server determine whether all redo log files are
available. A two-disk server greatly improves the possibility that you will have available all redo log files; for more
information, see See "Maintaining the ACSLS Database" . To restore the database completely you must make sure the
second disk is mounted before running the rdb.acsss utility.

Caution: After installing ACSLS, always run bdb.acsss to back up the database.

How to Restore the Database

Use the following procedures to restore the database from tape. These procedures also restore miscellaneous data files,
such as access control and custom volume report files. If you are restoring from a tape backup, make sure it is the most
current backup prior to losing the database.

Hint: If there has been physical damage to the area of the disk where the database resides, you may need to rebuild
all or part of the file system and reload the database software before beginning the restoration. This may include
reformatting the lost disk or replacing the disk entirely. In this case, contact StorageTek for assistance.

The following sections describe how to restore the database and miscellaneous library resource files from the following:

file://H:\study\netbackup\Upload_site_done\done\New Folder\Chapter 8_ Troubleshooting_acsls.htm 7/6/2010


Chapter 8. Troubleshooting Page 6 of 12

 The default tape device attached and configured to the ACSLS server.

 A specified tape device attached and configured to the ACSLS server.

 A UNIX file.

Hint: After running rdb.acsss, review the log created in $ACS_HOME/log/rdb_event.log for any problems the
utility may have encountered. Depending on how thorough the restoration was, you may need to do either or both of the
following to bring the database current:

 If the hardware has changed since the last backup (such as adding a new LSM or transport), you will need to rerun
acsss_config.

 If the library contents have changed significantly since the creation of the backup tape (for example, enters, ejects,
and scratch mounts), and there were no redo log files available, you will need to audit the library.

Restoring From The Default Tape Device Attached to the ACSLS Server

To restore the ACSLS database from the default tape device attached to the ACSLS server after a database
failure, do the following:

1. Login as acsss.

2. Shut down ACSLS:

kill.acsss

3. Shut down the database:

db_command stop

4. Ensure any existing redo log files are available to the database system.

If the hardware configuration includes only a primary disk and the primary disk fails, there may not be any redo
log files available. In this case, the database may only be restored to the point at which bdb.acsss was last
executed.

If the hardware configuration includes a second disk for database support and the primary disk fails, the redo log
files are available for restoration. The user, however, must verify the second disk is installed and its partition is
mounted. This can be verified by executing the df command from any window, then making sure the second disk
primary and backup partitions are mounted.

If the hardware configuration includes a second disk for database support and the second disk fails, you must
deinstall the second disk using sd_mgr.sh. No data is lost, but running bdb.acsss more frequently is advised.
When the second disk is replaced, rerun sd_mgr.sh to reinstall second disk support.

file://H:\study\netbackup\Upload_site_done\done\New Folder\Chapter 8_ Troubleshooting_acsls.htm 7/6/2010


Chapter 8. Troubleshooting Page 7 of 12

If the hardware configuration included a second disk for database support and both the primary and second disks
fail simultaneously, the redo log files may not be available for restoration.

5. Make sure the most current backup tape is write-protected, then insert it into the tape device.

6. Enter the following command:

rdb.acsss

Depending on the nature of the database problem, the utility may prompt you for additional information.

7. Wait for the following prompt:

After performing a successful Oracle database recovery


you should do a database backup. This will ensure your
ability to recover again should you have another database
failure. Do you want to do this database backup now?

8. Remove the tape from the drive and insert a new backup tape.

9. Enter "y"and press Return to start the backup.

10. Wait for the backup to complete. Remove the tape from the drive, write protect it, label it with the current
date, and store it in a safe place.

11. To start ACSLS, enter the following command:

rc.acsss

Restoring From A Specified Tape Device Attached to the ACSLS Server

To restore the ACSLS database from a specified tape device attached to the ACSLS server after a database
failure, do the following:

1. Login as acsss.

2. Shut down ACSLS:

kill.acsss

3. Shut down the database:

db_command stop

4. Ensure any existing redo log files are available to the database system.

file://H:\study\netbackup\Upload_site_done\done\New Folder\Chapter 8_ Troubleshooting_acsls.htm 7/6/2010


Chapter 8. Troubleshooting Page 8 of 12

If the hardware configuration includes only a primary disk and the primary disk fails, there may not be any redo
log files available. In this case, the database may only be restored to the point at which bdb.acsss was last
executed.

If the hardware configuration includes a second disk for database support and the primary disk fails, the redo log
files are available for restoration. The user, however, must verify the second disk is installed and its partition is
mounted. This can be verified by executing the df command from any window, then making sure the second disk
primary and backup partitions are mounted.

If the hardware configuration includes a second disk for database support and the second disk fails, you must
deinstall the second disk using sd_mgr.sh. No data is lost, but running bdb.acsss more frequently is advised.
When the second disk is replaced, rerun sd_mgr.sh to reinstall second disk support.

If the hardware configuration included a second disk for database support and both the primary and second disks
fail simultaneously, the redo log files may not be available for restoration.

5. Make sure the most current backup tape is write-protected, then insert it into the tape device.

6. Enter the following command:

rdb.acsss -f tape_device

Where tape_device specifies a tape device attached to the ACSLS server.

Depending on the nature of the database problem, the utility may prompt you for additional information.

7. Wait for the following prompt:

After performing a successful Oracle database recovery


you should do a database backup. This will ensure your
ability to recover again should you have another database
failure. Do you want to do this database backup now?

8. Remove the tape from the drive and insert a new backup tape.

9. Enter "y"and press Return to start the backup.

10. Wait for the backup to complete. Remove the tape from the drive, write protect it, label it with the current
date, and store it in a safe place.

11. To start ACSLS, enter the following command:

rc.acsss

Example-To restore the ACSLS database from tape device /dev/rmt/2, enter the following command:

rdb.acsss -f /dev/rmt/2

file://H:\study\netbackup\Upload_site_done\done\New Folder\Chapter 8_ Troubleshooting_acsls.htm 7/6/2010


Chapter 8. Troubleshooting Page 9 of 12

Restoring From Specified Files

Caution: rdb.acsss does not check the contents of the file to verify that it contains the backed up database and
miscellaneous library resource file! Make sure you specify the correct file!

To restore the ACSLS database from a specified file, do the following:

1. Login as acsss.

2. Shut down ACSLS:

kill.acsss

3. Shut down the database:

db_command stop

4. Ensure any existing redo log files are available to the database system.

If the hardware configuration includes only a primary disk and the primary disk fails, there may not be any redo
log files available. In this case, the database may only be restored to the point at which bdb.acsss was last
executed.

If the hardware configuration includes a second disk for database support and the primary disk fails, the redo log
files are available for restoration. The user, however, must verify the second disk is installed and its partition is
mounted. This can be verified by executing the df command from any window, then making sure the second disk
primary and backup partitions are mounted.

If the hardware configuration includes a second disk for database support and the second disk fails, you must
deinstall the second disk using sd_mgr.sh. No data is lost, but running bdb.acsss more frequently is advised.
When the second disk is replaced, rerun sd_mgr.sh to reinstall second disk support.

If the hardware configuration included a second disk for database support and both the primary and second disks
fail simultaneously, the redo log files may not be available for restoration.

5. Enter the following command:

rdb.acsss -f db_file

Where:

db_file specifies a UNIX file that contains the ACSLS database backup.

6. Wait for the following prompt:

After performing a successful Oracle database recovery


you should do a database backup. This will ensure your
ability to recover again should you have another database
failure. Do you want to do this database backup now?

file://H:\study\netbackup\Upload_site_done\done\New Folder\Chapter 8_ Troubleshooting_acsls.htm 7/6/2010


Chapter 8. Troubleshooting Page 10 of 12

7. Insert a new backup tape into the backup device.

8. Enter "y"and press Return to start the backup.

9. Wait for the backup to complete. Remove the tape from the drive, write protect it, label it with the current
date, and store it in a safe place.

10. To start ACSLS, enter the following command:

rc.acsss

Restoring a Corrupted Database

The ACSLS database structure may be corrupted if you are experiencing all of the following problems:

 You cannot do basic ACSLS functions, such as mounts, dismounts, enters, and ejects

 You are receiving database failure messages.

 Restoring the database using the most current backup does not fix the problem

If the ACSLS database is corrupted, you must delete the database, reinstall the database and reconfigure ACSLS, and run
an audit.

To restore a corrupted database, do the following:

1. Open a command tool and log in as acssa.

2. From the cmd_proc window, idle ACSLS:

idle

3. Open another command tool and log in as acsss.

4. Terminate ACSLS and shut down the database:

kill.acsss

db_command stop

db_command abort

5. Log out as acsss and log in as root.

6. Change to the /export/home directory:

cd /export/home

file://H:\study\netbackup\Upload_site_done\done\New Folder\Chapter 8_ Troubleshooting_acsls.htm 7/6/2010


Chapter 8. Troubleshooting Page 11 of 12

7. Delete the existing database:

rm -r oracle

8. Insert the ACSLS distribution media and change to the installation directory.

For more information, see the ACSLS Installation and Configuration Guide for your platform and version.

9. Start the installation script:

./install.sh

10. Enter y to respond to the prompt.

/export/home/ACSSS files exist, install will not overlay an


existing ACSLS system. To re-install the ACSLS
software, exit install and remove the /export/home/ACSSS
directory hierarchy. Continue with install? (If you answer
yes, the ACSLS distribution area will be unchanged.)

Status messages appear as the database is installed.

11. When the "installation complete" message appears, remove the distribution media.

12. Respond to the prompts for database backup directory, automatic startup, and adding a modem.

For more information, see the ACSLS Installation and Configuration Guide for your platform and version.

13. At the next prompt, press <CTRL> + <c> to exit the installation script.

14. Log out as root and log in as acsss.

15. Run acsss_config to reconfigure ACSLS.

For more information, see the ACSLS Installation and Configuration Guide for your platform and version.

16. Start ACSLS:

rc.acsss

17. Audit the entire library.

For more information, see "Auditing the Library" .

18. Run bdb.acsss to back up the database.

For more information, see "Doing Manual Backups to Tape".

file://H:\study\netbackup\Upload_site_done\done\New Folder\Chapter 8_ Troubleshooting_acsls.htm 7/6/2010


Chapter 8. Troubleshooting Page 12 of 12

Recovering From Primary Disk Failure (Second Disk Installed)

If you have a second server disk installed, use this procedure to recover from a primary disk failure.

To recover from a primary disk failure, do the following:

1. Install the new primary disk according to the manufacturer's instructions.

2. Install the operating system and ACSLS.

For more information, see the ACSLS Installation and Configuration Guide for your platform and version.

3. Log out as acsss and log in as root.

4. Create the mount point for the /second_disk directory:

mkdir /second_disk

The mount point for the /second_disk/backup should already exist.

5. Add entries for the second disk and second disk backup mount points to the /etc/vfstab file.

The following shows examples of these entries.

/dev/dsk.c0t1d0s0 /dev/rdsk/c0t1d0s0 /second_disk ufs 4 yes -

/dev/dsk.c0t1d0s1 /dev/rdsk/c0t1d0s1 /second_disk/backup ufs 5 yes -

6. Mount these directories to the filesystem:

mount -a

7. Log out as root and log in as acsss.

8. Configure ACSLS.

For more information, see the ACSLS Installation and Configuration Guide for your platform and version.

9. Run rdb.acsss to restore the database.

For more information, see "How to Restore the Database" . The redo log files residing on second disk will be
applied to the backup version to bring the database up to date.

10. Run bdb.acsss to backup the restored data base.

For more information, see "Doing Manual Backups to Tape".

file://H:\study\netbackup\Upload_site_done\done\New Folder\Chapter 8_ Troubleshooting_acsls.htm 7/6/2010

Você também pode gostar