Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > linux.debian.bugs.dist > #1090341
| From | Jean-Paul Larocque <jpl-debian-bts@thoughtcrime.us> |
|---|---|
| Newsgroups | linux.debian.bugs.dist |
| Subject | Bug#1004139: ifenslave: Hotplugged interfaces sometimes not added to bond-master due to ifquery race condition |
| Date | 2022-01-21 17:00 +0100 |
| Message-ID | <DI4uu-8dV-5@gated-at.bofh.it> (permalink) |
| Organization | linux.* mail to news gateway |
[Multipart message — attachments visible in raw view] - view raw
Package: ifenslave
Version: 2.13
Severity: normal
Hi,
I have a simple bonding configuration with two physical Ethernet
interfaces, both defined with the `allow-hotplug` option. I use
allow-hotplug because the interfaces are on USB. And since I'm
using allow-hotplug, I chose the style of configuration where I use
the `bond-master` configuration option under each slave configuration
stanza, rather than `bond-slaves` in the bonding master interface
stanza.
This configuration is similar to the one in
examples/two_hotplug_ethernet. I'm attaching a copy of my
/etc/network/interfaces file.
The problem is that sometimes after a reboot, I find that one of my
interfaces (wlx0013efd01275: an external, wireless USB interface) is
not a member of the bond:
$ ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: enxb827eb9e4634: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master bond0 state UP group default qlen 1000
link/ether b8:27:eb:9e:46:34 brd ff:ff:ff:ff:ff:ff
3: wlx0013efd01275: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether 00:13:ef:d0:12:75 brd ff:ff:ff:ff:ff:ff
inet6 2002:ce3f:e590:2:213:efff:fed0:1275/64 scope global dynamic mngtmpaddr
valid_lft 86179sec preferred_lft 14179sec
inet6 2002:ce3f:e590:1:213:efff:fed0:1275/64 scope global dynamic mngtmpaddr
valid_lft 86179sec preferred_lft 14179sec
inet6 fe80::213:efff:fed0:1275/64 scope link
valid_lft forever preferred_lft forever
4: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether b8:27:eb:9e:46:34 brd ff:ff:ff:ff:ff:ff
inet 192.168.66.19/21 brd 192.168.71.255 scope global bond0
valid_lft forever preferred_lft forever
inet6 2002:ce3f:e590:1:1::19/64 scope global nodad
valid_lft forever preferred_lft forever
inet6 fe80::ba27:ebff:fe9e:4634/64 scope link
valid_lft forever preferred_lft forever
$ cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v5.10.0-10-rpi
Bonding Mode: fault-tolerance (active-backup)
Primary Slave: enxb827eb9e4634 (primary_reselect always)
Currently Active Slave: enxb827eb9e4634
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0
Peer Notification Delay (ms): 0
Slave Interface: enxb827eb9e4634
MII Status: up
Speed: 100 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: b8:27:eb:9e:46:34
Slave queue ID: 0
$ /sbin/ifquery --state
lo=lo
bond0=bond0
wlx0013efd01275=wlx0013efd01275
enxb827eb9e4634=enxb827eb9e4634
In this situation, this is logged to syslog (via systemd):
sh[299]: Failed to enslave wlx0013efd01275 to bond0.
I think I understand the cause, and propose a workaround or a bug fix
(depending on how you look at it).
When both devices are present at boot, two ifup@<iface>.service units
(one for each interface) are started simultaneously. It seems like
the ifenslave ifupdown scripts are meant to handle this, but in
/etc/network/if-pre-up.d/ifenslave, in the definition of setup_slave_device(),
there is:
# Ensure the master is up or being configured
export IFENSLAVE_ENV_NAME="IFUPDOWN_$IF_BOND_MASTER"
IFUPDOWN_IF_BOND_MASTER="$(printenv "$IFENSLAVE_ENV_NAME")"
unset IFENSLAVE_ENV_NAME
if [ -z "$IFUPDOWN_IF_BOND_MASTER" ] ; then
ifquery --state "$IF_BOND_MASTER" >/dev/null 2>&1 || ifup "$IF_BOND_MASTER"
fi
I've added loads of debugging and done many reboot cycles to find out
that when the problem occurs (or at least in one case), both
simultaneously-running processes get to the `ifquery` line. One of
the processes executes ifquery and gets a non-zero return code,
leading it to run `ifup "$IF_BOND_MASTER"`. After ifup starts, the
other script process executes ifquery and gets a zero return code.
In this case, the bond interface hasn't come up yet, but the command
to bring it up has started, which I think is why ifquery is returning
zero here. I can reproduce this behavior of ifquery with a dummy
interface:
iface dummy0 inet manual
pre-up modprobe dummy
pre-up sleep 5
up ip link add dummy0 type dummy
down ip link del dummy0
$ q() { sudo ifquery --state dummy0; echo " => $?"; }
$ q
=> 1
$ sudo ifup dummy0 & sleep 1; q; ip link show dummy0; ps $!
[1] 7956
dummy0=dummy0
=> 0
Device "dummy0" does not exist.
PID TTY STAT TIME COMMAND
7956 pts/1 S 0:00 sudo ifup dummy0
This shows that ifquery does return 0, even while ifup is still
working.
From my reading of ifquery(8), this is possibly a bug ("successful
code is returned if all of interfaces given as arguments are up").
Either ifquery is supposed to return failure in this case (meaning
ifquery has a bug, since the interface hasn't finished coming up yet),
or it is valid for it to return successfully since an ifup process has
been _started_ for this interface (in which case the ifenslave script
has a bug--but ifquery(8) could probably be improved). Regardless of
whether ifquery has a bug here, the ifenslave script could be
simplified in a way that works around the issue anyway:
# [omitted context]
if [ -z "$IFUPDOWN_IF_BOND_MASTER" ] ; then
ifup "$IF_BOND_MASTER"
fi
Now, if ifup gets started concurrently for the same bond master
interface, the second process blocks (due to ifupdown locking) until
the first process completes. Both processes running
if-pre-up.d/ifenslave would now continue after bond0 has finished its
configuration.
ifup is idempotent, so there's no harm in running it every time. In
fact, with how the script is written now, there's a race condition
where ifquery could return failure, and in-between the return and the
ifup, the interface could come up. If that happens, ifup gets run on
a bond interface in the up state. Because of the idempotence of ifup,
there's no harm in this hypothetical race condition, but the script
already relies on it, so I'd suggest removing the ifquery condition
altogether.
The only downside I see to running `ifup` every time is that I now get
extra messages in my bootup scenario:
ifup[283]: ifup: waiting for lock on /run/network/ifstate.bond0
sh[311]: ifup: interface bond0 already configured
But if I understand correctly, those are informational, and not
indicating an error.
When I remove that ifquery condition, I no longer encounter the
problem, even after letting this machine reboot in a loop all night.
I'm attaching my modified version of the script.
Do you think this fix could be accepted? If you think it'd be more
appropriate for me to file a bug against ifupdown regarding ifquery,
please let me know.
Thanks,
-Jean-Paul
-- System Information:
Debian Release: 11.2
APT prefers unstable
APT policy: (500, 'unstable')
Architecture: armel (armv6l)
Kernel: Linux 5.10.0-10-rpi
Kernel taint flags: TAINT_CRAP, TAINT_UNSIGNED_MODULE
Locale: LANG=C, LC_CTYPE=C (charmap=UTF-8) (ignored: LC_ALL set to en_US.UTF-8), LANGUAGE not set
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled
Versions of packages ifenslave depends on:
ii ifupdown 0.8.36
ii iproute2 5.10.0-4
Versions of packages ifenslave recommends:
ii net-tools 1.60+git20181103.0eebece-1
ifenslave suggests no packages.
-- no debconf information
Back to linux.debian.bugs.dist | Previous | Next | Find similar
Bug#1004139: ifenslave: Hotplugged interfaces sometimes not added to bond-master due to ifquery race condition Jean-Paul Larocque <jpl-debian-bts@thoughtcrime.us> - 2022-01-21 17:00 +0100
csiph-web