The heatsink on one of the CPUs had a fan that wasn’t working. I replaced it with the spare heatsink from the new V210 that is replacing PACO. The V210 came with 2 CPUs as part of an educational discount, and our Oracle license will only allow us one CPU.
The standalone DLT4000 (Dell PowerVault 110T) tape drive that we used for backing up the catalog was exhibiting problems yesterday and EYEWI was refusing to recognize its presence on the SCSI chain. I replaced the drive with a DLT1 drive, reconfiguring NetBackup in the process (NetBackup doesn’t directly use the drive, but it does know about it).
I also removed the Overland DLT library from the system and powered it off, again telling NetBackup about the changes. I’ve removed most of the DLT media from the media manager except for a few tapes that seem to be “assigned” and can’t be deleted.
The primary change to EYEWI was the introduction of a SpectraLogic T120 tape library in fall, 2004.
The T120 (Superfrog) is SCSI-attached and contains two LTO2 tape drives. It currently has the default of 30 slots available. The Overland DLT library is still attached but unused. The standalone DLT drive is still the local backup drive. We are in the process of erasing all previous DLT tapes for resale.
The two 36 GB disks from ROJ that were being used for AMANDA backups are now on EYEWI and being used as a disk storage unit for system information in NetBackup.
The /data partition on EYEWI is a disk storage unit that is used for backing up the system information on the servers. Previously it was a 21 GB mirror using slice 7 of the two primary system disks. It is now a RAID 5 device using slice 7 of all four disks in the system, making it a 63 GB volume. The two new disks have the same partitioning as the two current disks, giving us about 20 GB of other storage for scratch use, if necessary.
EYEWI suffered a system drive failure on Saturday morning, and Monday afternoon and evening I replaced the drive and rebuilt the system.
First off, make sure your volume manager really is mirroring all the partitions on the system.
Recovery proceeded fairly straightforwardly. Since only the root partition had actually been mirrored, the system would not boot with the second drive. I used one of the spare drives from ROJ's AMANDA installation to copy the root partition of the surviving drive and to test restoring the /home partition from the dump that happens with every successful NetBackup catalog backup.
I Jumpstarted the system with the surviving drive in its original location (slot 1) and the other of ROJ's spare AMANDA drives in slot 0. After Jumpstart, I restored the /home partition from the catalog tape, moved /home/opt to a temporary location, and installed NetBackup (in order to get the appropriate files in /etc). I moved the new /home/opt out of the way and moved the old /home/opt into position, and NetBackup seemed almost happy.
Details that needed to be corrected: the Dell PowerVault 110T and the Overland tape library had swapped device files - /dev/rmt/0* now points to the tape library wile /dev/rmt/1* points to the 110T. I needed to use the Java GUI on NetBackup to change the device files for the two storage units. I had also originally forgotten to increase the shared memory configuration in /etc/system, and thus backups failed with error code 11 (failed system call).
There are still a few other details to be worked out, but the Jumpstart and /home restore seem to have gotten the bulk of it.
AMANDA’s limitation is that it cannot span tapes. As we are now backing up 60 GB partitions that ar 90% full, this leads to an inefficient use of tape space - the extra space at the end of a 40/80 GB tape can’t be used as there are no small filesystems to put there. As a result, it takes more than 8 tapes to make a full backup of the entire server set where it should take no more than 5.
NetBackup is resident on a new server: EYEWI, a SunFire V240 running Solaris 9. We have moved the tape library to this server and have also attached the DLT40000 drive that we had gotten with PAX originally. The main backups are to the library while the NetBackup catalog backups are to the DLT40000.
NetBackup requires a distinctly different mindset than AMANDA in its conception of backup schedules and retention times. Where AMANDA uses a tape cycle concept to implement a retention period (that is, you have _N_ tapes with a cycle of _M_ days, thereby deriving a full backup frequency and retention period), NetBackup is perhaps less intelligent. Instead, you define the frequency and windows for full and incremental backups, and you also define an explicit retention period for these backups. This, together with a client and file list makes up the NetBackup concept of a policy.
We have implemented NetBackup policies such that there are at most three policies per client and no two clients share a policy. While it is potentially possible to put multiple similar clients into a single policy, we have few enough clients that are this similar and there is a management simplicity of one client in a policy. We have defined policies for user, system, and business data. Some clients may not have some of these (for example, the WebDB server has no user data, and several of the Solaris servers have system data that is easily replicable from a Jumpstart). The user and business data are written to tape, and the system data is compressed and saved on a disk storage unit on EYEWI.
The catalog database is backed up to local disk after every successful policy backup. The post-backup script has been modified to make a snapshot of the filesystem and dump it to the DLT40000 tape.
Solaris and NetBackup require some tuning for optimum performance. Veritas has this document on Solaris kernel tuning for shared memory and IPC tuning and this document on NetBackup shared memory and network performance tuning - the result is better throughput on the network and tape systems. Putting EYEWI on gigabit ethernet also speeds up backups.