We have ClearOS 7 Community systems configured to boot with UEFI that are failing to boot after the latest GRUB update.
Error:
System BootOrder not found. Initializing defaults.
Creating boot entry "Boot0002" with label "ClearOS" for file "\EFI\clearos\shimx64.efi"
Failed to open \EFI\clearos\grubx64.efi -- Not Found
Failed to load image \EFI\clearos\grubx64.efi: Not Found
start_image() returned Not Found
StartImage failed: Not Found
I've determined that one of the latest GRUB updates is moving grubx64.efi from /boot/efi/EFI/clearos/ to /boot/efi/EFI/centos/ Of course the boot configuration is looking for the grubx64.efi to be in the clearos directory, not the (new with the update) centos one. Therefore boot fails.
I don't know precisely which of the following packages broke it, but the problem appeared after version 2.02-0.87.v7.6 of all the following were installed during a regular yum run:
grub2
grub2-common
grub2-efi-x64
grub2-pc
grub2-pc-modules
grub2-tools
grub2-tools-extra
grub2-tools-minimal
Error:
System BootOrder not found. Initializing defaults.
Creating boot entry "Boot0002" with label "ClearOS" for file "\EFI\clearos\shimx64.efi"
Failed to open \EFI\clearos\grubx64.efi -- Not Found
Failed to load image \EFI\clearos\grubx64.efi: Not Found
start_image() returned Not Found
StartImage failed: Not Found
I've determined that one of the latest GRUB updates is moving grubx64.efi from /boot/efi/EFI/clearos/ to /boot/efi/EFI/centos/ Of course the boot configuration is looking for the grubx64.efi to be in the clearos directory, not the (new with the update) centos one. Therefore boot fails.
I don't know precisely which of the following packages broke it, but the problem appeared after version 2.02-0.87.v7.6 of all the following were installed during a regular yum run:
grub2
grub2-common
grub2-efi-x64
grub2-pc
grub2-pc-modules
grub2-tools
grub2-tools-extra
grub2-tools-minimal
Share this post:
Accepted Answer
Nick Howitt wrote:
Are you booting off a rescue/installation disk?
The threading on this forum is truly, um...
Anyway...
Being a bit of a Linux clutz, I missed step 2a on my first attempt!
1. Boot to Recovery environment (installation CD/Flashdrive, under the Troubleshooting menu)
2. proceed with the mount sysimage option
2a ENTER
3. chroot /mnt/sysimage
4. cp /boot/efi/EFI/centos/grubx64.efi /boot/efi/EFI/clearos/grubx64.efi
5. grub2-mkconfig -o /boot/efi/EFI/clearos/grub.cfg
6. exit
7. exit (again) -- system will reboot; remove recovery drive.
Cheers
Responses (66)
-
Accepted Answer
-
Accepted Answer
-
Accepted Answer
This is what I have in "blkid". I left out the /dev/sdx partitions.
/dev/nvme0n1p1: SEC_TYPE="msdos" UUID="5C18-CE03" TYPE="vfat" PARTLABEL="EFI System Partition" PARTUUID="9a37b632-2be9-4726-ae20-0e3ebb52cdbd"
/dev/nvme0n1p2: UUID="5dedca90-5dd5-4210-b6aa-0f2d96c6b722" TYPE="xfs" PARTUUID="08dc394d-2e1d-4b35-ad89-0768f8b6745d"
/dev/nvme0n1p3: UUID="b719d753-6db1-4d69-a72e-97bfde3c1a7a" TYPE="xfs" PARTUUID="d34693a7-83a7-4a60-ba02-2d69d131fe50"
/dev/nvme0n1p4: UUID="b5035d6d-c564-4c2c-9e8b-d269c9cfd5fa" TYPE="swap" PARTUUID="09faf8f2-1cce-4db0-8e52-301201820ef6"
/dev/nvme0n1p5: UUID="okFsKQ-yTWf-CZ3J-Rbiw-2HDf-1A6k-2b7KFG" TYPE="LVM2_member" PARTUUID="d18a34a2-76de-4fe5-9452-e16b8fac05a2"
/dev/mapper/clearos-UserDisk1: LABEL="UserDisk1" UUID="7517634c-7217-43f5-8518-65fed898522d" TYPE="xfs"
/dev/mapper/clearos-UserDisk2: LABEL="UserDisk2" UUID="dedd44a1-a898-4f72-b1f4-3046bb7266f6" TYPE="xfs"
/dev/mapper/clearos-UserDisk3: LABEL="UserDisk3" UUID="38283ce5-1dd6-4b78-8ab8-449fa575824d" TYPE="xfs"
/dev/mapper/clearos-UserDisk4: LABEL="UserDisk4" UUID="d7714581-2ec5-43a3-9bf0-bcfc23a48db0" TYPE="xfs"
/dev/mapper/clearos-Razno: LABEL="Razno" UUID="8a069207-45ad-47f4-a0f7-fe725437a807" TYPE="xfs"
/dev/nvme0n1: PTTYPE="gpt"
This is what I have in /etc/fstab I also left out the /dev/sdx partitions.
UUID=b719d753-6db1-4d69-a72e-97bfde3c1a7a / xfs defaults 0 0
UUID=5dedca90-5dd5-4210-b6aa-0f2d96c6b722 /boot xfs defaults 0 0
UUID=5C18-CE03 /boot/efi vfat umask=0077,shortname=winnt 0 0
/dev/mapper/clearos-Razno /var/flexshare/shares/backup xfs defaults 0 0
/dev/mapper/clearos-UserDisk1 /var/flexshare/shares/userdisk1 xfs defaults 0 0
/dev/mapper/clearos-UserDisk2 /var/flexshare/shares/userdisk2 xfs defaults 0 0
/dev/mapper/clearos-UserDisk3 /var/flexshare/shares/userdisk3 xfs defaults 0 0
/dev/mapper/clearos-UserDisk4 /var/flexshare/shares/userdisk4 xfs defaults 0 0
UUID=b5035d6d-c564-4c2c-9e8b-d269c9cfd5fa swap swap defaults 0 0 -
Accepted Answer
Is there any chance of a copy of your "blkid" that I can use as an example. I'll cut it down to just the nvme drive and could munge the UUID's. Similarly, can I have a copy of your /etc/fstab?
I'll try manually mounting partitions from a rescue disk in a VM but it is BIOS only so only 2 partitions are needed, then see what happens if I try to chroot.
Note the rescue set up comes from Centos, and, possibly Redhat. It is not something we write. My NVME was used for testing grub in my proxmox server, but now it has VM's on it so I can't get to it any more.
Stick the exclude line at the end of yum.conf, but it can probably go anywhere. -
Accepted Answer
Hehe, thank you Marvin, have a relaxing weekend you too
Thank you for clarifying Nick. If it was your fault, then everything is forgiven Just a little bit "stressful" day
I'm sorry for my late response Nick, I didn't have time before.
Yes, it would be nice, that you add this in the Rescue howto, if someone also has ClearOS on a NVMe SSD. If I may ask, why the Rescue option number 1 doesn't work on NVMe SSD? Will this be repaired, or is this not possible?
The answer to you questions is yes, I did like you described. I also used the "lsblk -f" command to see the partitions. If it makes any difference?
I must add only one thing. Not that I didn't want to do chroot, I couldn't. It gave me an error.
If you have any other question just ask. I hope that I will remember it correctly
Thank you for the edit. I did the update, so I'm hoping that now it will be OK if I reboot. I won't try it, until it is really necessary :P
I won't add the exclude for now, thank you for telling how to do this. Just one question, if I decide to add it later. I add this right after [main] ? -
Accepted Answer
@Flash
I am glad this is working now and, if possible, I'd like to write this up in teh Rescue howto, but can you please clarify a few things:
1 - To get to the command prompt, you booted into rescue mode and when you got as far as the four menu options you chose option 3 -skip to shell
2 - When you skipped to shell you had the sh prompt
3 - disk identities and file systems were available from the "blkid" command
4 - In sh, you mounted your three disks into /, /boot and /boot/efi
5 - because of 4 you did not chroot
Personally, I think 4) is probably not a good idea so in the wrote up I'd suggest mounting into /mnt/sysimage and chrooting, but I'd add a note to the effect that you can mount the disks directly as you did.
I, unfortunately, was the "idiot" who released the grub2 update. It had not been updated for 3 years so it needed one. Because of its possibility of making a mess of things, I compiled it locally in mock and tested it on multiple systems including 2 EFI systems. This was a lot of hassle as one is a Proxmox server so I physically had to remove its disks and add in a new one to do ClearOS install and it worked fine. With it all working I pushed the build into the build system. This put the packages into the testing repo and I did a couple more updates but not an EFI one from that package. What I did not notice is that the build system was building the package slightly differently to the version compiled in mock and this is where it all went wrong. Next time (if there is one) I will fully test the package in updates-testing which will involve upgrading and downgrading whatever machines I have at the time. However, I doubt if there will be a next time unless there is a CVE against grub2. It is not a package we update very often. I only did it because I cannot get the grub2-mkconfig command to write a file on a BIOS (non-EFI) system and I was hoping this would fix it, but it does not. I have to redirect the output to file as it just appears on screen. So sorry for the mess I caused.
It is worth noting as Marvin pointed out, that the Community subscription always carries a bit of risk. Even with the most rigorous testing, bugs can always get through. Home and Business subscriptions get updated generally one or two weeks later than Community packages, with the exceptions of anything in the contribs repo and a few Business related packages which get released immediately. Grub2 is compiled by us, but you can get the same problem with packages we just pass on from Centos and EPEL without testing. This happened in June last year when a faulty microcode_ctl package was released upstream based on an update from Intel. It hit ClearOS, Centos, SLES and Debian distros at about the same time and for certain Intel processors it was fatal, and sometimes non-recoverable. I know at least one installation was lost because of this and here we are hostage to upstream fixes. Fixes came quickly because of the size of the issue, but again, Home and Business users did not have the issue because of their delayed releases.
I'd appreciate feedback to the questions.
@Marvin, I could not have done this without you either. Thanks.
[edit]
.... and if you really want to block future updates to grub2, add a line to /etc/yum.conf:
But make sure you've got the updated package first (it should have updated overnight)exclude=grub2*
[/edit] -
Accepted Answer
Flash,
Wonderful! Super-glad to hear you got it resolved!!!
So basically you mounted by hardware path instead of UUID there; no worries on that that I can think of (only mattered in the temporary recovery environment anyway).
As far as the "grub bug": you've resolved the issue: My understanding is that basically *at the time* of the faulty grub update installation, the bootx64.efi file got moved to the wrong place. On the next reboot, when grub couldn't find the file, it failed and when trying to autorecover(? -- the Creating New Boot Entry 0002) it wrote a corrupted config file. By moving the file back to the right place and re-generating the grub config file, you've undone the damage. Nick had revoked the broken grub update shortly after I reported it, and then released a fixed version after he located the issue and we had a chance to test it out. The re-released (fixed) grub update does work with zero issues (at least on my systems!).
Regarding grub updates, I don't know how frequent they are; Nick would have to weigh in on that. But if you don't want to encounter these types of issues, you should seriously consider upgrading to ClearOS Business, which is a paid annual subscription. ClearOS Business wasn't affected by this: the updates are tested on Community first for a while before they get moved to the repositories that Business uses.
It's a tradeoff: Community is free as far as money goes, but you're testing updates for everyone else, and are on your own + this forum for support if there are problems. Business is an annual subscription, which gets only tested updates, and also has the option of paid support tickets with the ClearOS developers should you need further assistance.
I hope you have a relaxing weekend now! -
Accepted Answer
Woooooowww I didn't believed anymore that this would be solvable...
Thank you Nick and Marvin for all the help.
Nick I did a little different before you posted. Instead of the UUID=something I wrote this in sh-4.2#
mount -t xfs /dev/nvme0n1p3 /
mount -t xfs /dev/nvme0n1p2 /boot
mount -t vfat /dev/nvme0n1p1 /boot/efi
(if I didn't write down the filesystem (vfat or xfs) it gave me an error)
And then did grub2-mkconfig -o /boot/efi/EFI/clearos/grub.cfg
And exit.
And it booted...
Is it OK now? Or is the buggy grub still on my system? Must I do something? Erase it or something? I don't want to fear every time I reboot the PC and pray that it would boot...
OMG... almost 1 full day of messing around with this... I was messing around with FTP for more than a week, and then this...
Just one more question... HOW CAN I DISABLE GRUB UPDATE? I don't want to do this in a month or so again, when someone decides to release such a bug again...
Thanks one more time. -
Accepted Answer
-
Accepted Answer
This is in /etc/fstab
UUID=b719d753.... /
UUID=5dedca90.... /boot
UUID=5C18-CE03 /boot/efi
When I type in the command "blkid" in bash-4.2# then nothing happens
I entered "exit" and then it gets me to sh-4.2#
If I enter the "blkid" command here, then it writes me all the UUIDs
I can see, that the first UUID is /dev/nvme0n1p3, second UUID is /dev/nvme0n1p2 and third UUID is /dev/nvme0n1p1
If I write "mount -a -v" in sh-4.2 it doesn't do anything, just goes in another line.
If I write "mount -a -v" in bash-4.2 is says:
/ : ignored -
Accepted Answer
-
Accepted Answer
-
Accepted Answer
-
Accepted Answer
-
Accepted Answer
Now try "mount -a" to mount everything else. Otherwise have a look at your /etc/fstab and see how p1 and p2 are mounted and mount them in the same way. /boot will be empty until p2 (or p1?) is mounted into it but you will need both p1 and p2 mounted to their correct places before you proceed. -
Accepted Answer
-
Accepted Answer
-
Accepted Answer
-
Accepted Answer
-
Accepted Answer
I'm a bit confused about the status of LVM in this situation: IIRC the ClearOS installer uses LVM by default, and I didn't need to do anything special for the recovery environment to recognize my installation. Flash, did you set up custom partitioning or use auto partitioning when you installed?
* In theory * if it's a "default" installation partition style, all that should need to be mounted is now is / [partition 3 for Flash?] and proceed with step 3 (skip 4 -- already done), and then go to step 5.
I don't know what to say beyond that... the NVMe drive may be a complication here. -
Accepted Answer
-
Accepted Answer
-
Accepted Answer
-
Accepted Answer
Flash,
I think that's some progress, though the "no such device" is a bit worrisome. Basically now the EFI system is starting up, but it's saying, hey, things aren't where I was expecting them. At this point, if we can trigger the grub2-mkconfig command on the right partition as per the recovery instructions, we might be getting places.
To do this:
Boot back into your recovery environment again as before, using the option 3 for the shell.
Mount the main system partition.
And chroot to that. (If this doesn't work, post the output of "lsblk -f" and "pwd")
Pick up with the recovery steps at #5 and tell us how it goes. If you get errors, please post the output.
EDIT: updated instructions. If I'm thinking correctly, you will need to be in the main system partition (not the boot one that I think you were in earlier). -
Accepted Answer
So can I just copy the grubx64.efi or not?
Yes, IMO you can go for it.
I take Nick's last post to be all about how to mount an LVM partition along with the others so you could have all of them mounted in your recovery environment. I'm not sure that this is needed, but perhaps Nick has a particular reason for recommending this. Maybe for the grub reconfiguration? -
Accepted Answer
-
Accepted Answer
While you were posting I was writing this:
https://pario.no/2015/11/02/how-to-mount-lvm-partitions-from-rescue-mode/ and https://documentation.online.net/en/dedicated-server/rescue/mount-lvm-partition show how to mount an lvm main partition like / could be.
On my system:
(it didn't like the --all switch)[root@microserver ~]# lvm vgscan -v
Reading volume groups from cache.
Found volume group "main" using metadata type lvm2
[root@microserver ~]# lvm lvs
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert
root main -wi-ao---- 924.84g
swap main -wi-ao---- 2.00g
My /etc/fstab has:
So, presumably:/dev/mapper/main-root / ext4 defaults 1 1
will work but it is lightly inconsistent with the two examples I linked to which indicate:mount /dev/mapper/main-root /mnt/sysimage
should work, but, for me the two symlink together. Note there is no need to specify the filesystem type (xs, ext4 etc as it should be auto-detected)mount /dev/main/root /mnt/sysimage
If that works, you can chroot then issue the command "mount-a" which shuold mount all the other partitions. -
Accepted Answer
OK. I have now successfully mounted the p1 partition. It is vfat partition... don't now why.
I have now a /mnt/sysimage/EFI directory. In there there are BOOT centos and clearos.
Is this OK? Should I just copy grubx64.efi from centos to clearos?
If I write chroot /mnt/sysimage I still get the same answer: failed to run command "/bin/bash": No such file or directory -
Accepted Answer
Out of interest, won't /boot show as /mnt/sysimage/boot without chrooting? Agree that you'll have to chroot before recovering.
Yes, you're correct. I thought about mentioning this, but since I'm thinking you'd need to have changed root anyway to reconfigure grub, I had left it out to keep the discussion simple.
Marvin may need to comment here.
Hmm. I'll try to help, but must say I'm not super-experienced in "non-standard" boot recovery situations!
Flash, when you're at the shell in your recovery environment (not chroot'd anywhere), run "lsblk -f" -- this should show the filesystem type for each partition, which will be helpful to specify the correct filesystem type when you mount a partition.
As far as referencing shell utilities not in a given partition when using chroot, this IBM article may be helpful. That said, Nick's comment about changing directory vs chroot is a good one to bring up here: if you just want to "see what's there", while at your recovery environment shell, you can simply "cd /mnt/sysimage/" and that should get you to the root of the mounted partition.
If this doesn't get you out of the bind, could you post the output of "ls -lh" of the root of the partitions after you mount them? (you'll either need to chroot or use the cd /mnt/sysimage/ method mentioned above). -
Accepted Answer
-
Accepted Answer
Flash wrote:
p1 then is probably the msdos partition on a GPT labelled disk. Then p2 will be /boot.
If I mount p1 it says: wrong fs type, bad option, bad superblock on /dev/nvme0n1p1, missing codepage or helper program, or other error
If I mount p2, then it mounts. But then, if I do chroot /mnt/sysimage it says: failed to run command "/bin/bash": No such file or directory
If I go to /mnt/sysimage then there are files in there...
If it does not run bash, does it stay in sh? It should not matter as most commands are similar but I am worried.....
... because I wonder if you need to mount both; mount / first into /mnt/sysimage then perhaps chroot to it. Then mount p2 into /boot. Marvin may need to comment here. -
Accepted Answer
Marvin Martin wrote:
Out of interest, won't /boot show as /mnt/sysimage/boot without chrooting? Agree that you'll have to chroot before recovering.
One redundant note just to be sure we're on the same page: after you successfully mount a partition in your recovery environment, you'll need to chroot /mnt/sysimage/ before you can go looking for boot/. -
Accepted Answer
If I mount p1 it says: wrong fs type, bad option, bad superblock on /dev/nvme0n1p1, missing codepage or helper program, or other error
If I mount p2, then it mounts. But then, if I do chroot /mnt/sysimage it says: failed to run command "/bin/bash": No such file or directory
If I go to /mnt/sysimage then there are files in there...
Please login to post a reply
You will need to be logged in to be able to post a reply. Login using the form on the right or register an account if you are new here.
Register Here »