GNU bug report logs -
#58320
Hurd VM fails to boot on AMD EPYC (kvm-amd)
Previous Next
Reported by: Ludovic Courtès <ludo <at> gnu.org>
Date: Wed, 5 Oct 2022 21:02:01 UTC
Severity: normal
Tags: wontfix
Done: Ludovic Courtès <ludo <at> gnu.org>
Bug is archived. No further changes may be made.
Full log
Message #35 received at 58320 <at> debbugs.gnu.org (full text, mbox):
Ludovic Courtès <ludo <at> gnu.org> skribis:
> Through a dichotomy I tried to see how far it goes. The info I have so
> far is that ld.so errors out from elf/rtld.c:563 (line 565 is not
> reached):
>
> 558: if (bootstrap_map.l_addr || ! bootstrap_map.l_info[VALIDX(DT_GNU_PRELINKED)])
> 559: {
> 560: /* Relocate ourselves so we can do normal function calls and
> 561: data access using the global offset table. */
> 562:
> 563: ELF_DYNAMIC_RELOCATE (&bootstrap_map, 0, 0, 0);
> 564: }
> 565: bootstrap_map.l_relocated = 1;
> ...
> 578: __rtld_malloc_init_stubs ();
Via brute force¹, I found that ‘__assert_fail’ is hit, with its first
argument in $eax being:
--8<---------------cut here---------------start------------->8---
db> x/c 0x28604,80
ELF32_R_TYPE (reloc->r_info) == R_386_RELATIVE\000\000map->l_in
fo[VERSYMIDX (DT_VERSYM)] != NULL\000\000Fatal glibc error: Too
many audit mo
--8<---------------cut here---------------end--------------->8---
This comes from i386/dl-machine.h:
--8<---------------cut here---------------start------------->8---
auto inline void
__attribute ((always_inline))
elf_machine_rel_relative (Elf32_Addr l_addr, const Elf32_Rel *reloc,
void *const reloc_addr_arg)
{
Elf32_Addr *const reloc_addr = reloc_addr_arg;
assert (ELF32_R_TYPE (reloc->r_info) == R_386_RELATIVE);
*reloc_addr += l_addr;
}
--8<---------------cut here---------------end--------------->8---
How can we get there? Looking at ‘_dl_start’, it could be that
‘elf_machine_load_address’ returns a bogus value and we end up reading
wrong ELF data? Or it could be memory corruption somewhere. Or…?
Thing is, it’s not fully deterministic (happens 9 times out of 10 with
KVM, never happens without KVM).
Ideas? :-)
Ludo’.
¹ Building with ‘-fno-optimize-sibling-calls’ didn’t help get nicer
backtraces, but that’s prolly because all that early relocation code
is inlined.
This bug report was last modified 270 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.