GNU bug report logs - #58320
Hurd VM fails to boot on AMD EPYC (kvm-amd)

Previous Next

Package: guix;

Reported by: Ludovic Courtès <ludo <at> gnu.org>

Date: Wed, 5 Oct 2022 21:02:01 UTC

Severity: normal

Tags: wontfix

Done: Ludovic Courtès <ludo <at> gnu.org>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: Ludovic Courtès <ludo <at> gnu.org>
To: 58320 <at> debbugs.gnu.org
Cc: bug-hurd <at> gnu.org
Subject: bug#58320: Hurd VM fails to boot on AMD EPYC (kvm-amd)
Date: Sun, 09 Oct 2022 18:09:07 +0200
Hi!

Ludovic Courtès <ludo <at> gnu.org> skribis:

> $ addr2line -e  /gnu/store/m8afvcgwmrfhvjpd7b0xllk8vv5isd6j-glibc-cross-i586-pc-gnu-2.33/lib/ld.so.1 0x1000 0x11627 0x11bb
> ??:0
> /tmp/guix-build-glibc-cross-i586-pc-gnu-2.33.drv-0/glibc-2.33/elf/dl-misc.c:333
> :?
>
>
> That’s ‘_dl_fatal_printf’ calling ‘_exit’; it’s trying to tell us
> something.
>
> I’ll try and rebuild the system with the debugging patches at
> <https://lists.gnu.org/archive/html/bug-hurd/2011-11/msg00038.html>, to
> get early ld.so output, for lack of a better solution…

I tried adapted the patches above and tried them, but it seems that
‘_dl_sysdep_start’ isn’t even reached.  For example, I set a breakpoint
on ‘mach_task_self’ (called from ‘__mach_init’, called from
‘_dl_sysdep_start’), but that’s never reached (I’m assuming ‘break/tu’
is reliable, is it?).

The user-space backtrace upon trap remains unhelpful:

--8<---------------cut here---------------start------------->8---
start ext2fs: Hurd server bootstrap: ext2fs[device:hd0s1]Kernel Breakpoint trap,                                                        
 eip 0xc1030d5b                                                                                                                         
Breakpoint at  task_resume:     pushl   %ebp                                                                                            
db> debug traps /on                                                                                                                     
db> b task_terminate                                                                                                                    
set breakpoint #2                                                                                                                       
db> c                                                                                                                                   
Kernel Debug trap trap, eip 0xc1030d5b                                                                                                  
 execkernel: Page fault (14), code=6                                                                                                    
Stopped at  0x1000:     pushl   0x4(%ebx)                                                                                               
>>>>> user space <<<<<                                                                                                                  
0x1000(bfffff24,0,0,1160b,0)                                                                                                            
0x11627(bfffff9c,0,0,0,2)                                                                                                               
0x11bb()                                                                                                                                
--8<---------------cut here---------------end--------------->8---

… where:

--8<---------------cut here---------------start------------->8---
$ addr2line -e /gnu/store/4p1kab1c4h7h3kvgcm1hbjja4y5k9x4p-glibc-cross-i586-pc-gnu-2.33/lib/ld.so.1 0x11627 0x11bb
/tmp/guix-build-glibc-cross-i586-pc-gnu-2.33.drv-0/glibc-2.33/elf/dl-misc.c:333
:?
$ objdump -S /gnu/store/4p1kab1c4h7h3kvgcm1hbjja4y5k9x4p-glibc-cross-i586-pc-gnu-2.33/lib/ld.so.1 --start-address=0x000011b0 |head -40

/gnu/store/4p1kab1c4h7h3kvgcm1hbjja4y5k9x4p-glibc-cross-i586-pc-gnu-2.33/lib/ld.so.1:     file format elf32-i386


Disassembly of section .text:

000011b0 <_start>:
    11b0:       89 e0                   mov    %esp,%eax
    11b2:       83 ec 0c                sub    $0xc,%esp
    11b5:       50                      push   %eax
    11b6:       e8 b5 0a 00 00          call   1c70 <_dl_start>
    11bb:       83 c4 10                add    $0x10,%esp
--8<---------------cut here---------------end--------------->8---

So it would seem that ‘_dl_start’ is called and somehow then a tail-call
to ‘_dl_fatal_printf’ is made.

Through a dichotomy I tried to see how far it goes.  The info I have so
far is that ld.so errors out from elf/rtld.c:563 (line 565 is not
reached):

--8<---------------cut here---------------start------------->8---
558:  if (bootstrap_map.l_addr || ! bootstrap_map.l_info[VALIDX(DT_GNU_PRELINKED)])
559:    {
560:      /* Relocate ourselves so we can do normal function calls and
561:         data access using the global offset table.  */
562:
563:      ELF_DYNAMIC_RELOCATE (&bootstrap_map, 0, 0, 0);
564:    }
565:  bootstrap_map.l_relocated = 1;
...
578:  __rtld_malloc_init_stubs ();
--8<---------------cut here---------------end--------------->8---

It’s hard to be more precise because ELF_DYNAMIC_RELOCATE is a macro
that expands to quite a lot of code.

I don’t see the code path that would lead to a ‘_dl_fatal_printf’ call
though.

Ideas?  :-)

Ludo’.




This bug report was last modified 270 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.