Recently in RAID Category

Stupid PERC6 Tricks

Ghetto-cloning

I've discovered, quite by accident, a reliable way to clone one system to another using only the SAS hard drive controller, the PERC6 (LSI Logic).

When a pair of drives are installed in a system, the PERC6 will 'claim' the blank drives and put some data on them for management. This data relates to the drive, drive position in the enclosure, and the drive's unique descriptors like the WWN. If these drives appear in another RAID controller's configuration, that RAID controller will consider those drives as a Foreign Configuration, or Foreign Config for short. By default, the PERC6 will not do anything with these drives.

Certain circumstances will cause the PERC6 to automatically start a rebuild if a drive is in a degraded state. Experience has shown that the RAID controller 'does the right thing' when a bad drive is removed and is replaced with a blank drive. Another way is if an array is in a degraded state and a designated or global hot-spare becomes available. However, if a drive contains data on it, or a previous configuration, the PERC6 will not automatically import that configuration, nor will it overwrite the data. In this way, the PERC6 may recover without rebooting and going into the RAID BIOS. However, for situations which warrant it, such as advanced intervention, the user is forced to enter BIOS to manipulate the settings.

In this example, I will be using three servers, Blade1, Blade2, and Blade3. These may be abbreviated "BL1," "BL2," and "BL3." For hard drive one, I will use "HD0" and for hard drive two I will use "HD1". Blade1's first hard drive is BL1HD0 and Blade 3's second hard drive is BL3HD1.

First, the desired image is created using Blade1, with both BL1HD0 and BL1HD1 loaded and configured as a mirrored RAID set. One the image been built and the drives are synchronized, the server is powered down.

At this point, all hard drives are removed and labeled for current location. Blade 2 is booted with BL1HD0 in hard drive slot one. Blade 3 is concurrently started with BL1HD1 in slot two. The remaining slots are kept empty.

Upon booting, the RAID BIOS notices a large change in configuration and offers to the user to Import the Foreign Configuration or enter the setup utility. One should enter the setup utility.

From the setup utility, a new "tab" appears in the menu at the top: "Foreign Config". Operations may normally be navigated by using "F2" to produce a menu of options and Alt+N for switching between the pages (tabs).

Navigating to the VD (Virtual Drive) tab, one may notice that, at the controller level, there is no configuration, or a missing configuration. Hitting "F2" at this point will produce a menu with a number of options, one of which is "Foreign Config", with the sub-options of "Import" or "Clear".  Take note of this: disaster is a keystroke away.

By selecting Import Foreign Config, the original drive configuration will be removed from the controller memory and a new one supplanted. Immediately, the system should show that two hard drives are configured, one is present, and one is missing.

At this point, one of the original hard drives should be reinserted into the empty slot. BL2HD1 and BL3HD0 are inserted into the respective vacancies left from earlier activity. "F5," or "Refresh" is applied after the drives have spun up.

The BIOS shows no change to the existing configuration. Switching to the "PD" or Physical Drive tab (Alt+N), note that one drive is Online and the other is Ready. Flip over to the VD page, make sure the cursor is on the Controller-level.
Hit "F2", and select "Clear Foreign Configuration".  Switch back over to the PD page and highlight the drive which is Ready. Press F2, and select "Make Global Hotspare". Select "Yes" to give priority to the enclosure. This will make the array rebuild faster. Page over to VD again and notice that the array is now rebuilding and a progress meter is present.  Rebuilding will continue into the recently installed original drive has been mirrored.

Once the mirroring process has finished, the "borrowed" drive should be pulled from the chassis. The borrowed drive will be marked as "bad" by the RAID controller, which will shift to the remaining drive to as the 'good' drive.

Now the original drives should be re-inserted into their respective systems and slots. BL1HD0 and BL1HD1 are re-installed, and BL2HD0 is installed. BL3HD1 is installed in Blade 3, hard drive slot two. The process is repeated, removing the foreign config, making the 'new' drive a Global Hot Spare, and rebuilding completed.

Blade 1 usually experiences some issues because it's drives have been in a foreign server, so one of the drives may come up as foreign even after the foreign config has been imported. At this point, the 'bad' or 'missing' drive should be manually made a Global Hot Spare. The PERC6 will start rebuilding that drive.

Once rebuilding has started, the systems may be booted, as the mirroring operation will continue until it completes without majorly impacting operations. 

Recovering From a Lost Configuration

It was discovered that a server which has a PERC6 had no configuration information present in it. Talking with the admin, it was discovered that the Import Foreign Config had already been performed. Since the server is remotely located, it is known that nothing changed about this system. The BIOS showed two hard drives in the "Ready" state and no virtual drive configured. The "Action" options were "Clear", "Create", and "Manage Persistent Cache". "Import" was greyed out.

Realizing there was no option at this point except to rebuild the machine, the "Create" option is exercised to attempt to recreate the virtual drive, which is not present in the configuration. Everything else being the same, as long as the drive geometry is the same, the system should boot. However, the BIOS would not let the administrator create a new virtual drive.

After some consternation and head scratching, this "Manage Persistent Cache" option was explored. In that option, the controller showed that it still had data in the cache from the previously defined virtual drive. Further, the verbiage explained that if data is present in the cache, 'certain operations' will not be performed.

I made the call to wipe out the cache. It was done. From here, the administrator recreated the Virtual Drive using the defaults. This process presented the least amount of change and the best possibility of not damaging the drive. The RAID controller now displayed that a virtual drive existed, and that two hard drives existed in that configuration. Further, it reported that Drive 00 (Enclosure 0, Slot 0) was failed, and that Drive 01 was fine. I told them to boot the system, and it reached the login screen; back to normal, just one hard drive down.

I hope your RAID hijinks aren't nearly as fun. I earned my fee; the trip it saved and the triage time spent on the server cost more than my hourly rate.

About this Archive

This page is an archive of recent entries in the RAID category.

Power is the previous category.

Rants is the next category.

Find recent content on the main index or look in the archives to find all content.