Recovery Scenarios Using the Rescue Image for ClearOS 6

This guide is only for recovering ClearOS 6 installations. For ClearOS 7, go here.

If you've come to this page because you need the information contained here, let us begin by saying that we are sorry your system is not working and we hope this page will help. There are a number of reasons why you may need to use the rescue image. It is a valuable tool for anyone supporting ClearOS and as such is included on every ClearBOX appliance by default. Here are just some of the reasons why you may need to use the rescue image which is contained on your ClearOS installation CD/DVD/USB/ISO:

Hard drive failure
RAID problems
GRUB problems
Kernel boot issues
Init boot issues

This guide is intended as an instructional howto. The commands listed here are not to be used verbatim but are intended to illustrate examples. This guide is provided without warranty to accuracy or applicability to your specific situation. ClearCenter will not be liable for lost data as a result of the data in this guide. Please take all precautions necessary to preserve your own data

DON'T PANIC

Likely your problem is causing you stress and this can lead to extreme reactions which can destroy data. In this guide we will attempt to point out key validation points in the diagnosis and also validation that changes you made actually took. Some of this process is complex and this guide will NOT necessarily meet the needs of your particular problem. That being said, if it doesn't fix the problem, you will know what your problem is NOT.

Establishing the point of failure

You may have multiple problems. For example, a failed disk will affect your RAID, GRUB, Kernel and init process. The boot of ClearOS goes through the following stages:

BIOS/POST
GRUB
Kernel
Init

Powering on your system will result in a series of tests. If your Power On Self Test (POST) completes it will usually beep once and transition straight to the first cylinder on the first device as listed in your BIOS. This first cylinder should contain boot code call GRUB (Grand Unified Boot Loader). GRUB under ClearOS contains an item and a count down timer. This will transition to a black screen which will fill up quickly with text on 5.x and to a graphical screen on 6.x. From here the kernel will load devices, and hand over the boot process to the init scripts.

So where the process ends is key to understand where to start fixing the issue. Error messages are critical and it is a good idea to write them down or Google them if you don't understand WHERE it is failing.

Rescue Image

Starting the Rescue Image

All ClearOS installations contain a rescue image. To successfully start this image you need to tell your BIOS to boot from the installation media. This can require a modification of the boot order in your BIOS or perhaps your BIOS supports a keystroke with allows you to select your boot order.

Options

The rescue CD will ask what language and keyboard you wish to use. It will ask you which media you want to get the rescue image from. If you did this from CD, choose CD/DVD. It will also ask you if you want to set up network preferences. Most of the time you will NOT need any network devices to fix any problems but occasionally you will. Skip this step unless you are sure you need it.

Next the system will attempt to find your ClearOS installation. It may ask you to initialize disks. If you have NOT added new disks then you should probably skip this step. In cases where you have added new disks, this is normal and they need to be initialized if you wish to use them with ClearOS.

If the system finds your ClearOS installation, it will mount the entire thing in /mnt/sysimage.

Lastly, we will need to get the shell so that we can issue commands.

Special circumstances

RAID issues

Checking partitions

You may be here in this document because you have lost the first disk in your RAID array and the system is either unbootable and/or you need to use the rescue mode for the repair instead of the regular OS. If this is the case you may have identified the bad disk and replaced it already or perhaps you need to just look around and assess the damage. If you did add a new disk, the rescue CD will ask you to initialize that disk.

Survey the landscape by running the following and take an inventory of your physical disks and their partitions:

fdisk -l | less

Here is an example of what a RAID disk will look like from running that command:

Disk /dev/sda: 2000.3 GB, 2000398934016 bytes
255 heads, 63 sectors/track, 243201 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1          15      120456   fd  Linux raid autodetect
/dev/sda2              16        7664    61440592+  fd  Linux raid autodetect
/dev/sda3            7665      243201  1891950952+  fd  Linux raid autodetect

Making partitions on the new disk to match the old

If you've replaced the disk that failed, you will notice that it will not have any partitions yet. Ideally, the replacement disks will have the same geometry as its RAID member if not you will need to get a little more technical and ensure that the partitions sizes either match or are greater than the original:

255 heads, 63 sectors/track, 243201 cylinders

Write this information down. You will need it later so that you can make the new disk with the same geometry and information. Of particular note is the start and end numbers, the partition number, the type and lastly which drive has the asterisk '*' character.

Locate your unformated/unpartitioned disk and run the following (in this example our disk is /dev/sdb):

fdisk /dev/sdb

You will enter the fdisk menu system which will look like this:

[root@server ~]# fdisk /dev/sdb

The number of cylinders for this disk is set to 243201.
There is nothing wrong with that, but this is larger than 1024,
and could in certain setups cause problems with:
1) software that runs at boot time (e.g., old versions of LILO)
2) booting and partitioning software from other OSs
   (e.g., DOS FDISK, OS/2 FDISK)

Command (m for help):

There are several command that you will use here but to familiarize yourself with the tool, type 'm' on the keyboard and press . This will show you a list of commands. Note the 'p' command. This shows you the proposed layout to the disk. Run this now by pressing 'p'.

On your blank disk it will not show any partitions. Go ahead and let's make one. Type 'n' for new partition. It will ask you whether you want a primary or extended. You can have up to 4 partitions that are primary - or you can have 3 primary and many extended. Typically the first partition will be primary. Type 'p' for primary. When it asks for which partition, use '1' for the first. It will ask what the start cylinder is and by default will show the [1]. If that matches your notes from the other drive then enter that. It will ask for the end cylinder, supply that as well. When it is completed, type 'p' to view your partition. Repeat this process for each partition.

You will likely need to change the type of the drive from 83 to something else. If this is the case then do the following and supply the correct hex code:

Command (m for help): t
Partition number (1-4): 1
Hex code (type L to list codes): fd

Review your changes using the 'p' command.

You will likely need to set the active partition (the asterisk) on the correct partition. Do the following or similar:

Command (m for help): a
Partition number (1-4): 1

Review your changes and ensure that the information is correct. If you want to abort the proposed changes type 'q' to quit. If you want to write these changes and commit the partition proposal to disk, type 'w' for write.

Double-check your work by running fdisk -l or you can limit the results to just the disks that are part of your RAID by listing them in brackets like this:

fdisk -l /dev/sd[ab]

or this if you have 5 disks

fdisk -l /dev/sd[abcde] | less

RAID with MultiDisk

Checking MultiDisk Status

Familiarize yourself with this command:

cat /proc/mdstat

This command is useful for watching what your MultiDisk RAID is doing RIGHT now. Here is an output that shows one RAID volume:

[root@gateway-utah ~]# cat /proc/mdstat 
Personalities : [raid1] 
md1 : active raid1 sda1[0] sdb1[1]
      120384 blocks [2/2] [UU]
      
unused devices:

Let's take this apart.

This RAID is RAID 1:
- Personalities : [raid1]
This RAID has one block device that is working:
- /dev/md1
This RAID is made up of two partitions:
- /dev/sda1
  - raid member [0]
- /dev/sdb1
  - raid member [1]
There are two disks in this array
- [2/2]
There are two working disks in this array
- [2/2] (a failed member would report this: [2/1])
Both drives are up
- [UU] (failed members will look like underscores: [U_])

Another useful command is to watch this file as it will display the status. You can do this especially when you are rebuilding to see the progress bar:

watch cat /proc/mdstat

Assembling your disks

Multidisk arrays are usually assembled by the /etc/mdadm.conf file. However, you are likely in this section because your RAID is not assembling…and how can it if mdadm.conf does not exist. Moreover, you CANNOT assemble disks in the rescue CD using a typical command like:

mdadm --assemble /dev/md0 /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1
# ^^^^^^^^^^^^^^^ This won't work in Rescue Mode ^^^^^^^^^^^^^^^ #

Ok, so how do we assemble our disks? First, let's check our disk members (do this on all partitions which comprise your RAID):

mdadm --examine /dev/sda1

You should get results like this:

/dev/sda1:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 03e965cf:42e2070c:eeb11af9:065b0b59
  Creation Time : Wed Aug  4 11:56:07 2010
     Raid Level : raid1
  Used Dev Size : 120384 (117.58 MiB 123.27 MB)
     Array Size : 120384 (117.58 MiB 123.27 MB)
   Raid Devices : 2
  Total Devices : 1
Preferred Minor : 1

    Update Time : Thu Aug 30 10:40:57 2012
          State : clean
 Active Devices : 1
Working Devices : 1
 Failed Devices : 1
  Spare Devices : 0
       Checksum : 819e9b9b - correct
         Events : 27850


      Number   Major   Minor   RaidDevice State
this     0       8        1        0      active sync   /dev/sda1

   0     0       8        1        0      active sync   /dev/sda1
   1     1       0        0        1      faulty removed

Check and make sure that the State is clean. If it is not clean you may have difficulties reassembling your array.

Now let's probe our disks and see what arrays we can find. Start by making a file in /etc/ called /etc/mdadm.conf. In it you will tell it which devices to scan:

DEVICE /dev/sd[abcd]1
DEVICE /dev/sd[abcd]2
DEVICE /dev/sd[abcd]3

In the above file, we will be scanning the first three partitions on four different drives for multidisk signatures. You will need to customize the above to suit your needs. Now, let's see what is there:

mdadm --examine --scan

This information is vital to assembling your array. If the output looks good, append this to your new /etc/mdadm.conf:

mdadm --examine --scan >> /etc/mdadm.conf

From here you can assemble your devices by name:

mdadm --assemble --scan /dev/md0
mdadm --assemble --scan /dev/md1

Now check to see your assembled RAID arrays:

cat /proc/mdstat

If this method does not work, you may have to try other means. Another way to see what is on our disks is to do an exhaustive probe and manual means:

mdadm -QE --scan

ARRAY /dev/md1 level=raid1 num-devices=2 UUID=03e965cf:42e2070c:eeb11af9:065b0b59
ARRAY /dev/md2 level=raid1 num-devices=2 UUID=c10a6566:11ce9088:da0e5da7:e1449030
ARRAY /dev/md3 level=raid1 num-devices=2 UUID=6d83baec:d8c4f50b:3ccc3173:326118cf

If you are familiar with Multidisk technology, you will notice that the output is very similar to the contents of the mdadm.conf file. In rescue mode, this information is critical because you can ONLY assemble disks using the UUID numbers.

Let's assemble md1:

mdadm --assemble --uuid 03e965cf:42e2070c:eeb11af9:065b0b59 /dev/md1

Notice that we do not put in the /dev/sdX1 disks. This is because the assemble will use the UUID which should be the same on each member. You will notice that this UUID is present when we ran the 'mdadm –examine /dev/sda1' command.

Now check the status using 'cat /proc/mdstat'.

Once the device is assembled, you can add the partitions that you created on your replacement disks (/dev/sdb1 in my example here):

mdadm --manage /dev/md0 --add /dev/sdb1

Now check the status using 'cat /proc/mdstat' or with 'watch cat /proc/mdstat'.

A rebuild of the array will begin at the beginning if one of the disks enters a uncompleted state before the sync is complete. A reboot will cause the sync to restart the sync from the beginning.

GRUB issues

Why me?

Sometimes your system won't boot. That just sucks. After the system completes the power on self tests the system will look to the special registers on the first portion of the boot disks for information about booting. In ClearOS, it is looking for GRUB (GRand Unified Bootloater).

GRUB scares people. It's true. It is a bit of voodoo magic that takes way to long to explain fully but that is OK. We just need it to work. If you are in this section it is because you've used the rescue CD (a working bootable thing) to get your system up to the point that you can attempt to put GRUB on your disk. A typical reason why this would occur might be that your system has fail a mirrored disk and GRUB was only on that first disk or perhaps it is suddenly improper on the second disk.

There is a really easy tool for fixing grub on your disk:

grub-install /dev/sda

Sadly, this won't work under normal conditions on the rescue CD because it isn't there…but it is on your ClearOS partition. Even then, you will need to get everything just so in order for it to work. This can be daunting so we are going to use the base level commands to get this going because they almost always work.

Getting to the commands we need

The commands you need to fix this are not on the rescue CD but they are on your ClearOS partition. You will need to get your ClearOS partitions mounted. During the boot up of the Rescue CD it will ask you whether it should search for your partitions, you should let it. If it found them it will tell you that it mounted them in /mnt/sysimage. If it found them, skip to the next section.

If it didn't find them you are going to need to build your environment so that you can get the commands. A common reason why the disks couldn't be see is that it could not automatically assemble the members of a MultiDisk RAID array. If this is the case, go to the section entitled “RAID with MultiDisk” above and resolve the issue of assembling your disk. Then mount your partitions in their correct place. Some of these commands may be useful examples:

mkdir /mnt/sysimage
mount /dev/md2 /mnt/sysimage # instead of md2 use whichever disk is the / partition
mount /dev/md1 /mnt/sysimage/boot # instead of md1 use whichever disk is the /boot partition

You might also want to bind mount some other running partitions if you need the special devices.

mount -o bind /sys /mnt/sysimage/sys
mount -o bind /dev /mnt/sysimage/dev
mount -o bind /proc /mnt/sysimage/proc

If the reason why the rescue CD cannot see your disks is because of drivers or your run into other issues that prevent the mounting of your drives in the /mnt/sysimage folder then you may need to get some specialized help of data recovery services.

A Changed Root Environment

The reasons for putting all this stuff in /mnt/sysimage is that we are going to fake out the system and transfer our entire directory structure to be taking place under /mnt/sysimage. We will use the 'chroot' command to do this.

chroot /mnt/sysimage

You will notice that once you execute this command that the entire directory structure is now transplanted to your hard disks. Check it out by running the ls command in directories that you have your stuff:

ls /home
ls /var/flexshare/shares

You can exit your chroot environment at any time by typing 'exit' and pressing .

Grubby commands

Now that we are in the chroot environment we can do some cool grub things. Type grub to enter the grub console:

grub

Probing devices to guess BIOS drives. This may take a long time.


    GNU GRUB  version 0.97  (640K lower / 3072K upper memory)

 [ Minimal BASH-like line editing is supported.  For the first word, TAB
   lists possible command completions.  Anywhere else TAB lists the possible
   completions of a device/filename.]
grub>

Imposing right? No need to fear, we'll make it easy here. GRUB is limited in some ways to a regular console but there are some cool things we can do. For example, we can find files on your partitions (if you know where they are). Try this find command out:

find /grub/menu.lst

grub> find /grub/menu.lst
find /grub/menu.lst
 (hd0,0)

This command will show you EVERY time /grub/menu.lst is found in that partition in that directory. If it doesn't find it, try 'find /boot/grub/menu.lst'. Usually this is going to be found on hd0. It's important to note that GRUB references disks with hd and then a number rather than the sdX or hdX method. Why you ask? because GRUB can be used to boot multiple types of operating systems from Windows to Unix. So the hd nomenclature is just standard.

If you found the menu.lst file it will indication that the boot directory is likely on that drive. The definitive test would be to find the kernel and initrd files but since this version of GRUB doesn't do that (GRUB 2 does it BTW and is found on ClearBOX already, yay!), we will need to just make an assumption here that this is the right disk.

The next thing we want to do is actually setup GRUB on that partition. Run the following:

root (hd0,0)

Results:

grub> root (hd0,0)
root (hd0,0)
 Filesystem type is ext2fs, partition type 0x83

Notice the partition is 0x83. You may recognize that partition type 83 is linux! or rather, the partition type listed here will match that found in an fdisk -l command for this disk.

Ok, now we will setup GRUB in the Master Boot Record (MBR: the first cylinder of a hard drive) area of the disk.

setup (hd0)

Ok, type quit to exit once the setup completes.

quit

Now let's reboot and see what we get.

Troubleshooting

Sometimes booting GRUB doesn't go as planned. In this case, you may not be loading the boot configuration file. What this looks like is that you boot the server and your are given an imposing:

    GNU GRUB  version 0.97  (640K lower / 3072K upper memory)

 [ Minimal BASH-like line editing is supported.  For the first word, TAB
   lists possible command completions.  Anywhere else TAB lists the possible
   completions of a device/filename.]
grub>

No need to fear. You can do a lot of cool things here. First find your grub.conf file.

find /grub/grub.conf

grub> find /grub/grub.conf
find /grub/grub.conf
 (hd0,0)

Set your root to the returned value listed:

root (hd0,0)

grub> root (hd0,0)
root (hd0,0)
 Filesystem type is ext2fs, partition type 0x83

Then load the config file:

configfile /grub/grub.conf

This should give you your boot menu…HUZZAH!

Once you have booted your ClearOS system with your full environment, simply run:

grub-install /dev/sda

Where /dev/sda is your boot drive.

Help

Links

Recoving RAID on Linux in Rescue Mode

ClearOS Documentation … Knowledgebase … Troubleshooting

Community Forums

ClearOS Portal

ClearVM Platform

ClearVM 2 Platform

Developers Documentation

Warning

Table of Contents