#58320 - Hurd VM fails to boot on AMD EPYC (kvm-amd)

GNU bug report logs - #58320
Hurd VM fails to boot on AMD EPYC (kvm-amd)

Package: guix;

Reported by: Ludovic Courtès <ludo <at> gnu.org>

Date: Wed, 5 Oct 2022 21:02:01 UTC

Severity: normal

Tags: wontfix

Done: Ludovic Courtès <ludo <at> gnu.org>

Bug is archived. No further changes may be made.

View this message in rfc822 format

From: Ludovic Courtès <ludo <at> gnu.org> To: 58320 <at> debbugs.gnu.org Cc: bug-hurd <at> gnu.org Subject: bug#58320: Hurd VM fails to boot on AMD EPYC (kvm-amd) Date: Mon, 10 Oct 2022 23:14:15 +0200

Ludovic Courtès <ludo <at> gnu.org> skribis: > Through a dichotomy I tried to see how far it goes. The info I have so > far is that ld.so errors out from elf/rtld.c:563 (line 565 is not > reached): > > 558: if (bootstrap_map.l_addr || ! bootstrap_map.l_info[VALIDX(DT_GNU_PRELINKED)]) > 559: { > 560: /* Relocate ourselves so we can do normal function calls and > 561: data access using the global offset table. */ > 562: > 563: ELF_DYNAMIC_RELOCATE (&bootstrap_map, 0, 0, 0); > 564: } > 565: bootstrap_map.l_relocated = 1; > ... > 578: __rtld_malloc_init_stubs (); Via brute force¹, I found that ‘__assert_fail’ is hit, with its first argument in $eax being: --8<---------------cut here---------------start------------->8--- db> x/c 0x28604,80 ELF32_R_TYPE (reloc->r_info) == R_386_RELATIVE\000\000map->l_in fo[VERSYMIDX (DT_VERSYM)] != NULL\000\000Fatal glibc error: Too many audit mo --8<---------------cut here---------------end--------------->8--- This comes from i386/dl-machine.h: --8<---------------cut here---------------start------------->8--- auto inline void __attribute ((always_inline)) elf_machine_rel_relative (Elf32_Addr l_addr, const Elf32_Rel *reloc, void *const reloc_addr_arg) { Elf32_Addr *const reloc_addr = reloc_addr_arg; assert (ELF32_R_TYPE (reloc->r_info) == R_386_RELATIVE); *reloc_addr += l_addr; } --8<---------------cut here---------------end--------------->8--- How can we get there? Looking at ‘_dl_start’, it could be that ‘elf_machine_load_address’ returns a bogus value and we end up reading wrong ELF data? Or it could be memory corruption somewhere. Or…? Thing is, it’s not fully deterministic (happens 9 times out of 10 with KVM, never happens without KVM). Ideas? :-) Ludo’. ¹ Building with ‘-fno-optimize-sibling-calls’ didn’t help get nicer backtraces, but that’s prolly because all that early relocation code is inlined.

This bug report was last modified 318 days ago.

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #58320 Hurd VM fails to boot on AMD EPYC (kvm-amd)

GNU bug report logs - #58320
Hurd VM fails to boot on AMD EPYC (kvm-amd)