Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > linux.debian.kernel > #91771 > unrolled thread

Bug#1130930: bisect log

Started bySalvatore Bonaccorso <carnil@debian.org>
First post2026-03-22 09:20 +0100
Last post2026-03-24 08:00 +0100
Articles 2 — 1 participant

Back to article view | Back to linux.debian.kernel

This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by below is the oldest one visible, not the original post.


Contents

  Bug#1130930: bisect log Salvatore Bonaccorso <carnil@debian.org> - 2026-03-22 09:20 +0100
    Bug#1130930: bisect log Salvatore Bonaccorso <carnil@debian.org> - 2026-03-24 08:00 +0100

#91771 — Bug#1130930: bisect log

FromSalvatore Bonaccorso <carnil@debian.org>
Date2026-03-22 09:20 +0100
SubjectBug#1130930: bisect log
Message-ID<MBmcp-9oRC-5@gated-at.bofh.it>
Hi Aman,

On Fri, Mar 20, 2026 at 10:58:40PM +0530, Aman Dhoot wrote:
> As you told, I bisect the kernel, and this is the log
> 
> ****************************************************************
> 
> $ git bisect log
> git bisect start
> # status: waiting for both good and bad commits
> # good: [567bd8cbc2fe6b28b78864cbbbc41b0d405eb83c] Linux 6.12.63
> git bisect good 567bd8cbc2fe6b28b78864cbbbc41b0d405eb83c
> # status: waiting for bad commit, 1 good commit known
> # bad: [ff2177382799753070b71747f646963147eabc7c] Linux 6.12.69
> git bisect bad ff2177382799753070b71747f646963147eabc7c
> # good: [ebdbe19336f26ffe799db842d751745098dc11ff] ASoC: renesas: rz-ssi:
> Fix rz_ssi_priv::hw_params_cache::sample_width
> git bisect good ebdbe19336f26ffe799db842d751745098dc11ff
> # bad: [e79b03d386341e85a4f775e0a864e8aa7633a0a2] HID: intel-ish-hid: Use
> dedicated unbound workqueues to prevent resume blocking
> git bisect bad e79b03d386341e85a4f775e0a864e8aa7633a0a2
> # good: [feb28b6827ece47cce585599a00b02ee579532bc] powercap: fix sscanf()
> error return value handling
> git bisect good feb28b6827ece47cce585599a00b02ee579532bc
> # good: [68495f89a19b6835e388b89b2ffecc0c68f9666c] selftests/landlock: Fix
> TCP bind(AF_UNSPEC) test case
> git bisect good 68495f89a19b6835e388b89b2ffecc0c68f9666c
> # good: [4433ddc3700cea880c383a6ddfc0e2ab697f9bdf] EDAC/x38: Fix a resource
> leak in x38_probe1()
> git bisect good 4433ddc3700cea880c383a6ddfc0e2ab697f9bdf
> # bad: [94b010200a3c9a8420a9063344cedbcd71794c8f] LoongArch: dts:
> loongson-2k0500: Add default interrupt controller address cells
> git bisect bad 94b010200a3c9a8420a9063344cedbcd71794c8f
> # good: [654fa76032eee5df9ce8849bdff840595952c63d] mm/page_alloc: make
> percpu_pagelist_high_fraction reads lock-free
> git bisect good 654fa76032eee5df9ce8849bdff840595952c63d
> # bad: [8140ac7c55e75093a01c6110a2c4025fe7177c57] drm/amd: Clean up kfd
> node on surprise disconnect
> git bisect bad 8140ac7c55e75093a01c6110a2c4025fe7177c57
> # good: [df7a49b328928b6d6b174d954d63721d6f3848a2] LoongArch: Fix PMU
> counter allocation for mixed-type event groups
> git bisect good df7a49b328928b6d6b174d954d63721d6f3848a2
> # good: [ae5b1d291c814a2884c3d54a56e83bc99052b1eb] drm/amd/display: Bump
> the HDMI clock to 340MHz
> git bisect good ae5b1d291c814a2884c3d54a56e83bc99052b1eb
> # first bad commit: [8140ac7c55e75093a01c6110a2c4025fe7177c57] drm/amd:
> Clean up kfd node on surprise disconnect
> 
> **********************************************************************************************
> 
> When the bisect is an end, it provides this output:
> 
> 
> 8140ac7c55e75093a01c6110a2c4025fe7177c57 is the first bad commit
> commit 8140ac7c55e75093a01c6110a2c4025fe7177c57
> Author: Mario Limonciello (AMD) <superm1@kernel.org>
> Date:   Wed Jan 7 15:37:28 2026 -0600
> 
>     drm/amd: Clean up kfd node on surprise disconnect
> 
>     commit 28695ca09d326461f8078332aa01db516983e8a2 upstream.
> 
>     When an eGPU is unplugged the KFD topology should also be destroyed
>     for that GPU. This never happens because the fini_sw callbacks never
>     get to run. Run them manually before calling
> amdgpu_device_ip_fini_early()
>     when a device has already been disconnected.
> 
>     This location is intentionally chosen to make sure that the kfd locking
>     refcount doesn't get incremented unintentionally.
> 
>     Cc: kent.russell@amd.com
>     Closes: https://community.frame.work/t/amd-egpu-on-linux/8691/33
>     Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
>     Reviewed-by: Kent Russell <kent.russell@amd.com>
>     Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
>     (cherry picked from commit 6a23e7b4332c10f8b56c33a9c5431b52ecff9aab)
>     Cc: stable@vger.kernel.org
>     Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> 
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> 
> According to me, this commit exists in the kernel version 6.12.66, and the
> problem also exists in v6.12.66

Thanks for doing that. It looks this is a regression fixed by
f7afda7fcd16 ("drm/amd: Fix hang on amdgpu unload by using
pci_dev_is_disconnected()"), which was backported to as well 6.22.77.

If possible it would be great if you can test that indeed this fixes
the problem.  Cf. https://kernel-team.pages.debian.net/kernel-handbook/ch-common-tasks.html#id-1.6.6.4

Regards,
Salvatore

[toc] | [next] | [standalone]


#91786

FromSalvatore Bonaccorso <carnil@debian.org>
Date2026-03-24 08:00 +0100
Message-ID<MC3U6-9SDj-9@gated-at.bofh.it>
In reply to#91771
HI Aman,

On Mon, Mar 23, 2026 at 07:50:42PM +0530, Aman Dhoot wrote:
> Hi, Salvatore
> 
> On Sun, Mar 22, 2026 at 1:43???PM Salvatore Bonaccorso <carnil@debian.org>
> wrote:
> 
> > Hi Aman,
> >
> > On Fri, Mar 20, 2026 at 10:58:40PM +0530, Aman Dhoot wrote:
> > > As you told, I bisect the kernel, and this is the log
> > >
> > > ****************************************************************
> > >
> > > $ git bisect log
> > > git bisect start
> > > # status: waiting for both good and bad commits
> > > # good: [567bd8cbc2fe6b28b78864cbbbc41b0d405eb83c] Linux 6.12.63
> > > git bisect good 567bd8cbc2fe6b28b78864cbbbc41b0d405eb83c
> > > # status: waiting for bad commit, 1 good commit known
> > > # bad: [ff2177382799753070b71747f646963147eabc7c] Linux 6.12.69
> > > git bisect bad ff2177382799753070b71747f646963147eabc7c
> > > # good: [ebdbe19336f26ffe799db842d751745098dc11ff] ASoC: renesas: rz-ssi:
> > > Fix rz_ssi_priv::hw_params_cache::sample_width
> > > git bisect good ebdbe19336f26ffe799db842d751745098dc11ff
> > > # bad: [e79b03d386341e85a4f775e0a864e8aa7633a0a2] HID: intel-ish-hid: Use
> > > dedicated unbound workqueues to prevent resume blocking
> > > git bisect bad e79b03d386341e85a4f775e0a864e8aa7633a0a2
> > > # good: [feb28b6827ece47cce585599a00b02ee579532bc] powercap: fix sscanf()
> > > error return value handling
> > > git bisect good feb28b6827ece47cce585599a00b02ee579532bc
> > > # good: [68495f89a19b6835e388b89b2ffecc0c68f9666c] selftests/landlock:
> > Fix
> > > TCP bind(AF_UNSPEC) test case
> > > git bisect good 68495f89a19b6835e388b89b2ffecc0c68f9666c
> > > # good: [4433ddc3700cea880c383a6ddfc0e2ab697f9bdf] EDAC/x38: Fix a
> > resource
> > > leak in x38_probe1()
> > > git bisect good 4433ddc3700cea880c383a6ddfc0e2ab697f9bdf
> > > # bad: [94b010200a3c9a8420a9063344cedbcd71794c8f] LoongArch: dts:
> > > loongson-2k0500: Add default interrupt controller address cells
> > > git bisect bad 94b010200a3c9a8420a9063344cedbcd71794c8f
> > > # good: [654fa76032eee5df9ce8849bdff840595952c63d] mm/page_alloc: make
> > > percpu_pagelist_high_fraction reads lock-free
> > > git bisect good 654fa76032eee5df9ce8849bdff840595952c63d
> > > # bad: [8140ac7c55e75093a01c6110a2c4025fe7177c57] drm/amd: Clean up kfd
> > > node on surprise disconnect
> > > git bisect bad 8140ac7c55e75093a01c6110a2c4025fe7177c57
> > > # good: [df7a49b328928b6d6b174d954d63721d6f3848a2] LoongArch: Fix PMU
> > > counter allocation for mixed-type event groups
> > > git bisect good df7a49b328928b6d6b174d954d63721d6f3848a2
> > > # good: [ae5b1d291c814a2884c3d54a56e83bc99052b1eb] drm/amd/display: Bump
> > > the HDMI clock to 340MHz
> > > git bisect good ae5b1d291c814a2884c3d54a56e83bc99052b1eb
> > > # first bad commit: [8140ac7c55e75093a01c6110a2c4025fe7177c57] drm/amd:
> > > Clean up kfd node on surprise disconnect
> > >
> > >
> > **********************************************************************************************
> > >
> > > When the bisect is an end, it provides this output:
> > >
> > >
> > > 8140ac7c55e75093a01c6110a2c4025fe7177c57 is the first bad commit
> > > commit 8140ac7c55e75093a01c6110a2c4025fe7177c57
> > > Author: Mario Limonciello (AMD) <superm1@kernel.org>
> > > Date:   Wed Jan 7 15:37:28 2026 -0600
> > >
> > >     drm/amd: Clean up kfd node on surprise disconnect
> > >
> > >     commit 28695ca09d326461f8078332aa01db516983e8a2 upstream.
> > >
> > >     When an eGPU is unplugged the KFD topology should also be destroyed
> > >     for that GPU. This never happens because the fini_sw callbacks never
> > >     get to run. Run them manually before calling
> > > amdgpu_device_ip_fini_early()
> > >     when a device has already been disconnected.
> > >
> > >     This location is intentionally chosen to make sure that the kfd
> > locking
> > >     refcount doesn't get incremented unintentionally.
> > >
> > >     Cc: kent.russell@amd.com
> > >     Closes: https://community.frame.work/t/amd-egpu-on-linux/8691/33
> > >     Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
> > >     Reviewed-by: Kent Russell <kent.russell@amd.com>
> > >     Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
> > >     (cherry picked from commit 6a23e7b4332c10f8b56c33a9c5431b52ecff9aab)
> > >     Cc: stable@vger.kernel.org
> > >     Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > >
> > >  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 8 ++++++++
> > >  1 file changed, 8 insertions(+)
> > >
> > >
> > > According to me, this commit exists in the kernel version 6.12.66, and
> > the
> > > problem also exists in v6.12.66
> >
> > Thanks for doing that. It looks this is a regression fixed by
> > f7afda7fcd16 ("drm/amd: Fix hang on amdgpu unload by using
> > pci_dev_is_disconnected()"), which was backported to as well 6.22.77.
> >
> > If possible it would be great if you can test that indeed this fixes
> > the problem.  Cf.
> > https://kernel-team.pages.debian.net/kernel-handbook/ch-common-tasks.html#id-1.6.6.4
> >
> > Regards,
> > Salvatore
> >
> 
> Yesterday, I tested the patch f7afda7fcd16 ("drm/amd: Fix hang on amdgpu
> unload by using pci_dev_is_disconnected()") on Debian linux-source-6.12. I
> can confirm that it fixed the issue???the hook script now runs completely and
> the VM starts, so it is working as expected.

Thanks for confirming that.

> Could you tell me when this patch will be merged into the stable Trixie
> kernel (v6.12) or into the Trixie backports kernel?

It is included in 6.12.77 and so will be picked up by the next trixie
upload for 6.12.y.

Regards,
Salvatore

[toc] | [prev] | [standalone]


Back to top | Article view | linux.debian.kernel


csiph-web