Recovery Scenarios Using the Rescue Image for ClearOS 6
If you've come to this page because you need the information contained here, let us begin by saying that we are sorry your system is not working and we hope this page will help. There are a number of reasons why you may need to use the rescue image. It is a valuable tool for anyone supporting ClearOS and as such is included on every ClearBOX appliance by default. Here are just some of the reasons why you may need to use the rescue image which is contained on your ClearOS installation CD/DVD/USB/ISO:
- Hard drive failure
- RAID problems
- GRUB problems
- Kernel boot issues
- Init boot issues
DON'T PANIC
Likely your problem is causing you stress and this can lead to extreme reactions which can destroy data. In this guide we will attempt to point out key validation points in the diagnosis and also validation that changes you made actually took. Some of this process is complex and this guide will NOT necessarily meet the needs of your particular problem. That being said, if it doesn't fix the problem, you will know what your problem is NOT.
Establishing the point of failure
You may have multiple problems. For example, a failed disk will affect your RAID, GRUB, Kernel and init process. The boot of ClearOS goes through the following stages:
- BIOS/POST
- GRUB
- Kernel
- Init
Powering on your system will result in a series of tests. If your Power On Self Test (POST) completes it will usually beep once and transition straight to the first cylinder on the first device as listed in your BIOS. This first cylinder should contain boot code call GRUB (Grand Unified Boot Loader). GRUB under ClearOS contains an item and a count down timer. This will transition to a black screen which will fill up quickly with text on 5.x and to a graphical screen on 6.x. From here the kernel will load devices, and hand over the boot process to the init scripts.
So where the process ends is key to understand where to start fixing the issue. Error messages are critical and it is a good idea to write them down or Google them if you don't understand WHERE it is failing.
Rescue Image
Starting the Rescue Image
All ClearOS installations contain a rescue image. To successfully start this image you need to tell your BIOS to boot from the installation media. This can require a modification of the boot order in your BIOS or perhaps your BIOS supports a keystroke with allows you to select your boot order.
Options
The rescue CD will ask what language and keyboard you wish to use. It will ask you which media you want to get the rescue image from. If you did this from CD, choose CD/DVD. It will also ask you if you want to set up network preferences. Most of the time you will NOT need any network devices to fix any problems but occasionally you will. Skip this step unless you are sure you need it.
Next the system will attempt to find your ClearOS installation. It may ask you to initialize disks. If you have NOT added new disks then you should probably skip this step. In cases where you have added new disks, this is normal and they need to be initialized if you wish to use them with ClearOS.
If the system finds your ClearOS installation, it will mount the entire thing in /mnt/sysimage.
Lastly, we will need to get the shell so that we can issue commands.
Special circumstances
RAID issues
Checking partitions
You may be here in this document because you have lost the first disk in your RAID array and the system is either unbootable and/or you need to use the rescue mode for the repair instead of the regular OS. If this is the case you may have identified the bad disk and replaced it already or perhaps you need to just look around and assess the damage. If you did add a new disk, the rescue CD will ask you to initialize that disk.
Survey the landscape by running the following and take an inventory of your physical disks and their partitions:
fdisk -l | less
Here is an example of what a RAID disk will look like from running that command:
Disk /dev/sda: 2000.3 GB, 2000398934016 bytes 255 heads, 63 sectors/track, 243201 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/sda1 * 1 15 120456 fd Linux raid autodetect /dev/sda2 16 7664 61440592+ fd Linux raid autodetect /dev/sda3 7665 243201 1891950952+ fd Linux raid autodetect
Making partitions on the new disk to match the old
If you've replaced the disk that failed, you will notice that it will not have any partitions yet. Ideally, the replacement disks will have the same geometry as its RAID member if not you will need to get a little more technical and ensure that the partitions sizes either match or are greater than the original:
255 heads, 63 sectors/track, 243201 cylinders
Write this information down. You will need it later so that you can make the new disk with the same geometry and information. Of particular note is the start and end numbers, the partition number, the type and lastly which drive has the asterisk '*' character.
Locate your unformated/unpartitioned disk and run the following (in this example our disk is /dev/sdb):
fdisk /dev/sdb
You will enter the fdisk menu system which will look like this:
[root@server ~]# fdisk /dev/sdb The number of cylinders for this disk is set to 243201. There is nothing wrong with that, but this is larger than 1024, and could in certain setups cause problems with: 1) software that runs at boot time (e.g., old versions of LILO) 2) booting and partitioning software from other OSs (e.g., DOS FDISK, OS/2 FDISK) Command (m for help):
There are several command that you will use here but to familiarize yourself with the tool, type 'm' on the keyboard and press
On your blank disk it will not show any partitions. Go ahead and let's make one. Type 'n' for new partition. It will ask you whether you want a primary or extended. You can have up to 4 partitions that are primary - or you can have 3 primary and many extended. Typically the first partition will be primary. Type 'p' for primary. When it asks for which partition, use '1' for the first. It will ask what the start cylinder is and by default will show the [1]. If that matches your notes from the other drive then enter that. It will ask for the end cylinder, supply that as well. When it is completed, type 'p' to view your partition. Repeat this process for each partition.
You will likely need to change the type of the drive from 83 to something else. If this is the case then do the following and supply the correct hex code:
Command (m for help): t Partition number (1-4): 1 Hex code (type L to list codes): fd
Review your changes using the 'p' command.
You will likely need to set the active partition (the asterisk) on the correct partition. Do the following or similar:
Command (m for help): a Partition number (1-4): 1
Review your changes and ensure that the information is correct. If you want to abort the proposed changes type 'q' to quit. If you want to write these changes and commit the partition proposal to disk, type 'w' for write.
Double-check your work by running fdisk -l or you can limit the results to just the disks that are part of your RAID by listing them in brackets like this:
fdisk -l /dev/sd[ab]
or this if you have 5 disks
fdisk -l /dev/sd[abcde] | less
RAID with MultiDisk
Checking MultiDisk Status
Familiarize yourself with this command:
cat /proc/mdstat
This command is useful for watching what your MultiDisk RAID is doing RIGHT now. Here is an output that shows one RAID volume:
[root@gateway-utah ~]# cat /proc/mdstat Personalities : [raid1] md1 : active raid1 sda1[0] sdb1[1] 120384 blocks [2/2] [UU] unused devices:
Let's take this apart.
- This RAID is RAID 1:
- Personalities : [raid1]
- This RAID has one block device that is working:
- /dev/md1
- This RAID is made up of two partitions:
- /dev/sda1
- raid member [0]
- /dev/sdb1
- raid member [1]
- There are two disks in this array
- [2/2]
- There are two working disks in this array
- [2/2] (a failed member would report this: [2/1])
- Both drives are up
- [UU] (failed members will look like underscores: [U_])
Another useful command is to watch this file as it will display the status. You can do this especially when you are rebuilding to see the progress bar:
watch cat /proc/mdstat
Assembling your disks
Multidisk arrays are usually assembled by the /etc/mdadm.conf file. However, you are likely in this section because your RAID is not assembling…and how can it if mdadm.conf does not exist. Moreover, you CANNOT assemble disks in the rescue CD using a typical command like:
mdadm --assemble /dev/md0 /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1 # ^^^^^^^^^^^^^^^ This won't work in Rescue Mode ^^^^^^^^^^^^^^^ #
Ok, so how do we assemble our disks? First, let's check our disk members (do this on all partitions which comprise your RAID):
mdadm --examine /dev/sda1
You should get results like this:
/dev/sda1: Magic : a92b4efc Version : 0.90.00 UUID : 03e965cf:42e2070c:eeb11af9:065b0b59 Creation Time : Wed Aug 4 11:56:07 2010 Raid Level : raid1 Used Dev Size : 120384 (117.58 MiB 123.27 MB) Array Size : 120384 (117.58 MiB 123.27 MB) Raid Devices : 2 Total Devices : 1 Preferred Minor : 1 Update Time : Thu Aug 30 10:40:57 2012 State : clean Active Devices : 1 Working Devices : 1 Failed Devices : 1 Spare Devices : 0 Checksum : 819e9b9b - correct Events : 27850 Number Major Minor RaidDevice State this 0 8 1 0 active sync /dev/sda1 0 0 8 1 0 active sync /dev/sda1 1 1 0 0 1 faulty removed
Check and make sure that the State is clean. If it is not clean you may have difficulties reassembling your array.
Now let's probe our disks and see what arrays we can find. Start by making a file in /etc/ called /etc/mdadm.conf. In it you will tell it which devices to scan:
DEVICE /dev/sd[abcd]1 DEVICE /dev/sd[abcd]2 DEVICE /dev/sd[abcd]3
In the above file, we will be scanning the first three partitions on four different drives for multidisk signatures. You will need to customize the above to suit your needs. Now, let's see what is there:
mdadm --examine --scan
This information is vital to assembling your array. If the output looks good, append this to your new /etc/mdadm.conf:
mdadm --examine --scan >> /etc/mdadm.conf
From here you can assemble your devices by name:
mdadm --assemble --scan /dev/md0 mdadm --assemble --scan /dev/md1
Now check to see your assembled RAID arrays:
cat /proc/mdstat
If this method does not work, you may have to try other means. Another way to see what is on our disks is to do an exhaustive probe and manual means:
mdadm -QE --scan
ARRAY /dev/md1 level=raid1 num-devices=2 UUID=03e965cf:42e2070c:eeb11af9:065b0b59 ARRAY /dev/md2 level=raid1 num-devices=2 UUID=c10a6566:11ce9088:da0e5da7:e1449030 ARRAY /dev/md3 level=raid1 num-devices=2 UUID=6d83baec:d8c4f50b:3ccc3173:326118cf
If you are familiar with Multidisk technology, you will notice that the output is very similar to the contents of the mdadm.conf file. In rescue mode, this information is critical because you can ONLY assemble disks using the UUID numbers.
Let's assemble md1:
mdadm --assemble --uuid 03e965cf:42e2070c:eeb11af9:065b0b59 /dev/md1
Notice that we do not put in the /dev/sdX1 disks. This is because the assemble will use the UUID which should be the same on each member. You will notice that this UUID is present when we ran the 'mdadm –examine /dev/sda1' command.
Now check the status using 'cat /proc/mdstat'.
Once the device is assembled, you can add the partitions that you created on your replacement disks (/dev/sdb1 in my example here):
mdadm --manage /dev/md0 --add /dev/sdb1
Now check the status using 'cat /proc/mdstat' or with 'watch cat /proc/mdstat'.
GRUB issues
Why me?
Sometimes your system won't boot. That just sucks. After the system completes the power on self tests the system will look to the special registers on the first portion of the boot disks for information about booting. In ClearOS, it is looking for GRUB (GRand Unified Bootloater).
GRUB scares people. It's true. It is a bit of voodoo magic that takes way to long to explain fully but that is OK. We just need it to work. If you are in this section it is because you've used the rescue CD (a working bootable thing) to get your system up to the point that you can attempt to put GRUB on your disk. A typical reason why this would occur might be that your system has fail a mirrored disk and GRUB was only on that first disk or perhaps it is suddenly improper on the second disk.
There is a really easy tool for fixing grub on your disk:
grub-install /dev/sda
Sadly, this won't work under normal conditions on the rescue CD because it isn't there…but it is on your ClearOS partition. Even then, you will need to get everything just so in order for it to work. This can be daunting so we are going to use the base level commands to get this going because they almost always work.
Getting to the commands we need
The commands you need to fix this are not on the rescue CD but they are on your ClearOS partition. You will need to get your ClearOS partitions mounted. During the boot up of the Rescue CD it will ask you whether it should search for your partitions, you should let it. If it found them it will tell you that it mounted them in /mnt/sysimage. If it found them, skip to the next section.
If it didn't find them you are going to need to build your environment so that you can get the commands. A common reason why the disks couldn't be see is that it could not automatically assemble the members of a MultiDisk RAID array. If this is the case, go to the section entitled “RAID with MultiDisk” above and resolve the issue of assembling your disk. Then mount your partitions in their correct place. Some of these commands may be useful examples:
mkdir /mnt/sysimage mount /dev/md2 /mnt/sysimage # instead of md2 use whichever disk is the / partition mount /dev/md1 /mnt/sysimage/boot # instead of md1 use whichever disk is the /boot partition
You might also want to bind mount some other running partitions if you need the special devices.
mount -o bind /sys /mnt/sysimage/sys mount -o bind /dev /mnt/sysimage/dev mount -o bind /proc /mnt/sysimage/proc
If the reason why the rescue CD cannot see your disks is because of drivers or your run into other issues that prevent the mounting of your drives in the /mnt/sysimage folder then you may need to get some specialized help of data recovery services.
A Changed Root Environment
The reasons for putting all this stuff in /mnt/sysimage is that we are going to fake out the system and transfer our entire directory structure to be taking place under /mnt/sysimage. We will use the 'chroot' command to do this.
chroot /mnt/sysimage
You will notice that once you execute this command that the entire directory structure is now transplanted to your hard disks. Check it out by running the ls command in directories that you have your stuff:
ls /home ls /var/flexshare/shares
Grubby commands
Now that we are in the chroot environment we can do some cool grub things. Type grub to enter the grub console:
grub
Probing devices to guess BIOS drives. This may take a long time. GNU GRUB version 0.97 (640K lower / 3072K upper memory) [ Minimal BASH-like line editing is supported. For the first word, TAB lists possible command completions. Anywhere else TAB lists the possible completions of a device/filename.] grub>
Imposing right? No need to fear, we'll make it easy here. GRUB is limited in some ways to a regular console but there are some cool things we can do. For example, we can find files on your partitions (if you know where they are). Try this find command out:
find /grub/menu.lst
grub> find /grub/menu.lst find /grub/menu.lst (hd0,0)
This command will show you EVERY time /grub/menu.lst is found in that partition in that directory. If it doesn't find it, try 'find /boot/grub/menu.lst'. Usually this is going to be found on hd0. It's important to note that GRUB references disks with hd and then a number rather than the sdX or hdX method. Why you ask? because GRUB can be used to boot multiple types of operating systems from Windows to Unix. So the hd nomenclature is just standard.
If you found the menu.lst file it will indication that the boot directory is likely on that drive. The definitive test would be to find the kernel and initrd files but since this version of GRUB doesn't do that (GRUB 2 does it BTW and is found on ClearBOX already, yay!), we will need to just make an assumption here that this is the right disk.
The next thing we want to do is actually setup GRUB on that partition. Run the following:
root (hd0,0)
Results:
grub> root (hd0,0) root (hd0,0) Filesystem type is ext2fs, partition type 0x83
Notice the partition is 0x83. You may recognize that partition type 83 is linux! or rather, the partition type listed here will match that found in an fdisk -l command for this disk.
Ok, now we will setup GRUB in the Master Boot Record (MBR: the first cylinder of a hard drive) area of the disk.
setup (hd0)
Ok, type quit to exit once the setup completes.
quit
Now let's reboot and see what we get.
Troubleshooting
Sometimes booting GRUB doesn't go as planned. In this case, you may not be loading the boot configuration file. What this looks like is that you boot the server and your are given an imposing:
GNU GRUB version 0.97 (640K lower / 3072K upper memory) [ Minimal BASH-like line editing is supported. For the first word, TAB lists possible command completions. Anywhere else TAB lists the possible completions of a device/filename.] grub>
No need to fear. You can do a lot of cool things here. First find your grub.conf file.
find /grub/grub.conf
grub> find /grub/grub.conf find /grub/grub.conf (hd0,0)
Set your root to the returned value listed:
root (hd0,0)
grub> root (hd0,0) root (hd0,0) Filesystem type is ext2fs, partition type 0x83
Then load the config file:
configfile /grub/grub.conf
This should give you your boot menu…HUZZAH!
Once you have booted your ClearOS system with your full environment, simply run:
grub-install /dev/sda
Where /dev/sda is your boot drive.