rc-sysinit job might start before loopback is up

Bug #461725 reported by Thierry Carrez
38
This bug affects 4 people
Affects Status Importance Assigned to Milestone
upstart (Ubuntu)
Fix Released
Medium
Steve Langasek
Karmic
Fix Released
Medium
Steve Langasek
Lucid
Fix Released
Medium
Steve Langasek

Bug Description

Binary package hint: upstart

9.10RC, upstart 0.6.3-10
In previous releases, networking would get started in rcS.d/S40networking, before the system switches to runlevel 2.
Now there is a race between the networking and the rc-sysinit job that may result in some early daemons in runlevel 2 to be started before loopback is available (i.e. rc2.d/S15dnsmasq).
According to slangasek this is a bug since "everything that rcS did for us before should still be guaranteed".

Detail:
mountall: start on startup, emits local-filesystems, virtual-filesystems and filesystem
udev: start on virtual-filesystems
udevtrigger: start on (startup and started udev)
networking: start on (local-filesystems and stopped udevtrigger)
rc-sysinit: start on filesystem

Filed against upstart since /etc/init/rc-sysinit.conf is shipped by upstart itself.

Revision history for this message
Thierry Carrez (ttx) wrote :

Some IRC context:
<soren> I'm just trying to understand if (and if so, how) we know that the loopback device is available once we start going through rc2.d/S* scripts.
<soren> ..since that used to be the case (given that networking ran at rcS.d/S40)
<ttx> ..and some server software kinda expects at least loopback to be up when started.
<ttx> for example, rc2.d/S15dnsmasq
<slangasek> soren: we don't; it's racy; file a bug on upstart
<soren> slangasek: You agree that having lo available in runlevel 2 is a reasonable expectation?
<slangasek> soren: yes, everything that rcS did for us before should still be guaranteed
<slangasek> unfortunately, switching /half/ of our init system has introduced a few race conditions

Revision history for this message
Scott James Remnant (Canonical) (canonical-scott) wrote :

I'll think about whether I consider this a bug or not

Changed in upstart (Ubuntu):
status: New → Incomplete
importance: Undecided → Medium
Revision history for this message
Simon Kelley (simon-thekelleys) wrote :

More context from the dnsmasq side of things:

http://lists.thekelleys.org.uk/pipermail/dnsmasq-discuss/2009q4/003369.html

Missing from the public archive is the result of adding "ip addr show" to the dnsmasq startup script, it looks like this:

1: lo: <LOOPBACK> mtu 16436 qdisc noop state DOWN
     link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
     link/ether 00:1f:c6:85:29:28 brd ff:ff:ff:ff:ff:ff
3: wmaster0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
     link/ieee802.11 00:24:01:12:87:8d brd ff:ff:ff:ff:ff:ff
4: wlan1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
     link/ether 00:24:01:12:87:8d brd ff:ff:ff:ff:ff:ff

Simon.

Revision history for this message
Jamie Strandboge (jdstrand) wrote :

I just came across this bug, wrt ufw. Traditionally, it has been expected that lo would be up before network daemons had started (eg, runlevel 2). It has become common practice to use an ifupdown script to bring up a firewall when lo is brought up. Ufw uses this practice and has the following in its upstart job: 'start on net-device-added INTERFACE=lo'. If this bug is not going to be fixed, then ufw must be adjusted so that firewall rules are in effect before network daemons start.

Revision history for this message
Jamie Strandboge (jdstrand) wrote :

I should also mention that it is also desirable that ufw start before any physical devices are brought up (eg when an admin wants to block dhcp packets from all but a specific server).

Revision history for this message
Steve Langasek (vorlon) wrote :

Jamie,

> start on net-device-added INTERFACE=lo

This is a separate bug in ufw: even after fixing this bug, there is no guarantee that ufw will have finished initializing before upstart starts to bring up other services in parallel. You need to change this instead to something like (untested): 'start on starting network-interface INTERFACE=lo'.

Steve Langasek (vorlon)
Changed in upstart (Ubuntu):
status: Incomplete → Triaged
Changed in upstart (Ubuntu Lucid):
status: Triaged → Fix Committed
Changed in upstart (Ubuntu Karmic):
importance: Undecided → Medium
status: New → Fix Committed
assignee: nobody → Steve Langasek (vorlon)
Changed in upstart (Ubuntu Lucid):
assignee: nobody → Steve Langasek (vorlon)
Revision history for this message
Steve Langasek (vorlon) wrote :

upstart 0.6.3-11 uploaded to karmic-proposed (candidate for copying to lucid once built & alpha-1 is out of the way).

Changed in upstart (Ubuntu Karmic):
status: Fix Committed → In Progress
Revision history for this message
Martin Pitt (pitti) wrote : Please test proposed package

Accepted upstart into karmic-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

Changed in upstart (Ubuntu Karmic):
status: In Progress → Fix Committed
tags: added: verification-needed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package upstart - 0.6.3-11

---------------
upstart (0.6.3-11) karmic-proposed; urgency=low

  * Make rc-sysinit.conf wait on the loopback interface, to ensure that the
    interface is up before we process the scripts in /etc/rc?.d. LP: #461725.
 -- Steve Langasek <email address hidden> Tue, 08 Dec 2009 12:58:37 -0800

Changed in upstart (Ubuntu Karmic):
status: Fix Committed → Fix Released
Revision history for this message
Steve Langasek (vorlon) wrote :

(copied to lucid)

Changed in upstart (Ubuntu Karmic):
status: Fix Released → Fix Committed
Changed in upstart (Ubuntu Lucid):
status: Fix Committed → Fix Released
Revision history for this message
Thierry Carrez (ttx) wrote :

Validation against the dnsmasq issue:
Setting 127.0.0.1 as one of the /etc/resolv.conf nameservers, artificially slowing down lo coming up (add a sleep in a if-pre-up.d script).

Without karmic-proposed:
"grep dnsmasq /var/log/syslog" shows "using nameserver 127.0.0.1#53"

With karmic-proposed:
"grep dnsmasq /var/log/syslog" shows ignoring nameserver 127.0.0.1 - local interface

So this works for me.

Martin Pitt (pitti)
tags: added: verification-done
removed: verification-needed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package upstart - 0.6.3-11

---------------
upstart (0.6.3-11) karmic-proposed; urgency=low

  * Make rc-sysinit.conf wait on the loopback interface, to ensure that the
    interface is up before we process the scripts in /etc/rc?.d. LP: #461725.
 -- Steve Langasek <email address hidden> Tue, 08 Dec 2009 12:58:37 -0800

Changed in upstart (Ubuntu Karmic):
status: Fix Committed → Fix Released
Revision history for this message
pjw (pjw1965) wrote :

This "fix" leads to another bug I found:
- event net-device-up IFACE=lo never comes up (I don't use dnsmasq)
- no init-scripts are executed after this "fix", worked before

I opended a bug report for this: https://bugs.launchpad.net/bugs/497299

$ ifconfig lo
lo Link encap:Local Loopback
          inet addr:127.0.0.1 Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING MTU:16436 Metric:1
          RX packets:89044 errors:0 dropped:0 overruns:0 frame:0
          TX packets:89044 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:183331178 (183.3 MB) TX bytes:183331178 (183.3 MB)

Revision history for this message
Steve Langasek (vorlon) wrote :

As commented in bug #497299, this is an inevitable problem on systems with broken /etc/network/interfaces configurations if we want to address the race condition described in this bug. I don't think it should block the SRU publication, but that's a call that should be made by the member of the SRU team handling this bug.

Revision history for this message
Scott James Remnant (Canonical) (canonical-scott) wrote :

No, you're wrong.

This causes a major regression and should be reverted

Revision history for this message
Martin Pitt (pitti) wrote :

I subscribed ubuntu-sru to bug 497299 now, and read it.

What I wonder about is how machines without a lo interface could have ever worked sensibly -- you will not even get to gdm without one. Is there something else which hardcodes bringing up lo during boot?

Since that seems to have caused real regressions for apparently quite some people (which apparently are not entirely understood yet, i. e. how they ended up with a broken interfaces file in the first place), but fixed regressions for others (this bug), this seems to be between a rock and a hard place, so I wouldn't hectically revert this.

As a workaround for karmic, could we wait on lo only if "grep -q 'auto lo' /etc/network/interfaces"?

For lucid and onwards, should we add an upstart job somewhere that brings up lo during rcS if it is not brought up by /e/n/i ?

Revision history for this message
Thierry Carrez (ttx) wrote :

<ttx> slangasek: should the rc-sysinit/network race bug be reopened, following up on keybuk's https://bugs.launchpad.net/ubuntu/+source/upstart/+bug/461725/comments/15 ?
<slangasek> ttx: no, that bug is fixed; the regression has really only been confirmed on systems with a broken network config

Revision history for this message
Steve Langasek (vorlon) wrote : Re: [Bug 461725] Re: rc-sysinit job might start before loopback is up

On Wed, Dec 23, 2009 at 02:24:12PM -0000, Martin Pitt wrote:
> What I wonder about is how machines without a lo interface could have
> ever worked sensibly -- you will not even get to gdm without one. Is
> there something else which hardcodes bringing up lo during boot?

So the full story here is that NetworkManager will mask the broken /e/n/i by
helpfully bringing up the lo interface when it starts, *but doesn't call the
ifupdown hooks for it*, so upstart doesn't see that it's up.

> Since that seems to have caused real regressions for apparently quite
> some people (which apparently are not entirely understood yet, i. e. how
> they ended up with a broken interfaces file in the first place), but
> fixed regressions for others (this bug), this seems to be between a rock
> and a hard place, so I wouldn't hectically revert this.

> As a workaround for karmic, could we wait on lo only if "grep -q 'auto
> lo' /etc/network/interfaces"?

Not using upstart's event system.

We should instead arrange to fix /e/n/i on upgrade. I'll use bug #497299 to
track this.

--
Steve Langasek Give me a lever long enough and a Free OS
Debian Developer to set it on, and I can move the world.
Ubuntu Developer http://www.debian.org/
<email address hidden> <email address hidden>

AmPuH (sriwijaya-team)
affects: upstart (Ubuntu Karmic) → ifupdown (Ubuntu Karmic)
Revision history for this message
Steve Langasek (vorlon) wrote :

This was a bug fixed in the upstart package. Please don't move it around.

affects: ifupdown (Ubuntu Karmic) → upstart (Ubuntu Karmic)
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.