Severe regression with latest kernel update: 3.0.0-14.23 takes an unreasonable amount of time to boot due to udev

Bug #902491 reported by Sergio Callegari
48
This bug affects 9 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Triaged
High
Unassigned
lvm2 (Ubuntu)
Confirmed
Undecided
Unassigned
udev (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

After the kernel update the machine takes ages to boot. At first I thought that the machine was not booting at all, but in fact it is, only taking an absurd amount of time where no disk activity is shown.

The boot appears to hang at /scripts/init-bottom.

The same symptom is present with the 3.2RC4 kernel on the mainline ppa.

ProblemType: Bug
DistroRelease: Ubuntu 11.10
Package: linux-image-3.0.0-14-generic 3.0.0-14.23
ProcVersionSignature: Ubuntu 3.0.0-14.23-generic 3.0.9
Uname: Linux 3.0.0-14-generic x86_64
AlsaVersion: Advanced Linux Sound Architecture Driver Version 1.0.24.
ApportVersion: 1.23-0ubuntu4
Architecture: amd64
ArecordDevices:
 Home directory /home/callegar not ours.
 **** List of CAPTURE Hardware Devices ****
 card 0: Intel [HDA Intel], device 0: STAC92xx Analog [STAC92xx Analog]
   Subdevices: 1/1
   Subdevice #0: subdevice #0
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: callegar 2406 F.... pulseaudio
 /dev/snd/seq: timidity 5021 F.... timidity
Card0.Amixer.info:
 Card hw:0 'Intel'/'HDA Intel at 0xf6adc000 irq 48'
   Mixer name : 'Intel Cantiga HDMI'
   Components : 'HDA:111d76b2,1028024f,00100302 HDA:14f12c06,14f1000f,00100000 HDA:80862802,80860101,00100000'
   Controls : 20
   Simple ctrls : 12
Date: Sat Dec 10 12:11:47 2011
HibernationDevice: RESUME=UUID=14fe66c7-7c31-4fdb-9894-de7f5e8eca75
InstallationMedia: Kubuntu 9.10 "Karmic Koala" - Release amd64 (20091027)
MachineType: Dell Inc. Latitude E6500
PccardctlIdent:
 Socket 0:
   no product info available
PccardctlStatus:
 Socket 0:
   no card
ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-3.0.0-14-generic root=/dev/mapper/group00-root ro recovery nomodeset
PulseSinks:
 Error: command ['pacmd', 'list-sinks'] failed with exit code 1: Home directory /home/callegar not ours.
 No PulseAudio daemon running, or not running as session daemon.
PulseSources:
 Error: command ['pacmd', 'list-sources'] failed with exit code 1: Home directory /home/callegar not ours.
 No PulseAudio daemon running, or not running as session daemon.
RelatedPackageVersions:
 linux-restricted-modules-3.0.0-14-generic N/A
 linux-backports-modules-3.0.0-14-generic N/A
 linux-firmware 1.60
RfKill: Error: [Errno 2] No such file or directory
SourcePackage: linux
UpgradeStatus: Upgraded to oneiric on 2011-10-16 (54 days ago)
dmi.bios.date: 08/19/2010
dmi.bios.vendor: Dell Inc.
dmi.bios.version: A24
dmi.board.name: 0W612R
dmi.board.vendor: Dell Inc.
dmi.chassis.type: 8
dmi.chassis.vendor: Dell Inc.
dmi.modalias: dmi:bvnDellInc.:bvrA24:bd08/19/2010:svnDellInc.:pnLatitudeE6500:pvr:rvnDellInc.:rn0W612R:rvr:cvnDellInc.:ct8:cvr:
dmi.product.name: Latitude E6500
dmi.sys.vendor: Dell Inc.

Revision history for this message
Sergio Callegari (callegar) wrote :
Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Sergio Callegari (callegar) wrote :

Trying to investigate a bit further

Issue is most likely with

# Stop udevd, we'll miss a few events while we run init, but we catch up
udevadm control --timeout=61 --exit

in /usr/share/initramfs-tools/scripts/init-bottom/udev

For some reason with the -14 kernel this command exits when udevadm times out, while previous kernels let udevadm exit much earlier.

As a proof, notice that trying to reduce the timeout makes -14 completely unbootable (machine comes up without /dev/sda, in spite of this device getting mentioned in dmesg), while previous kernels could still boot properly.

Cannot investigate any further as udev debugging is completely out of my reach. Cannot even find why the udevadm timeout does not get reported anywhere.

Revision history for this message
Sergio Callegari (callegar) wrote :

This is happening on a Dell E6500 with intel graphics which should be a rather widely spread machine, in case this information can help testing.

I have also tried the kernel on an eeepc 1000H, which does not seem to be affected by the bug.

summary: - Severe regression with latest kernel update: 3.0.0-14.23 takes 5 minutes
- to boot
+ Severe regression with latest kernel update: 3.0.0-14.23 takes an
+ unreasonable amount of time to boot due to udev
Revision history for this message
Jason (jasonxh) wrote :

Same problem here. Had to revert to old kernel for now.

Revision history for this message
Sergio Callegari (callegar) wrote :

Tried to catch some error info by booting in recovery mode

I think I managed watching for an instant something about

watershed -c sh /sbin/lvm vgscan; /sbin/lvm vgchange -a y

Can this issue be related to lvm?

To Jason: do you have any lvm partition?

Revision history for this message
Sergio Callegari (callegar) wrote :

Ok, I think I am probably running into bug Bug #802626 or Bug #818177

What happens is that if I add --noudevsync to the lvm2 udev rule in lib/udev and then I update the initramfs the delay disappears.

Thus adding lvm2 and udev to the bug.

However, I still wonder why I am running into this.

1) My other machines that are on LVM do not have this issue
2) It is completely unclear to me what changed in the -14 kernel so that this kernel shows the issue and -13 does not
3) My situation is not exactly the one in VG #802626 because my root filesystem _is_ on a logical volume

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

This issue appears to be an upstream bug, since you tested the latest upstream kernel. Would it be possible for you to open an upstream bug report at bugzilla.kernel.org [1]? That will allow the upstream Developers to examine the issue, and may provide a quicker resolution to the bug.

If you are comfortable with opening a bug upstream, It would be great if you can report back the upstream bug number in this bug report. That will allow us to link this bug to the upstream report.

The bugzilla.kernel.org site may still be unavailable due to the recent break-in. If that is the case, please add the tag: kernel-needs-upstream-bug-report

[1] https://wiki.ubuntu.com/Bugs/Upstream/kernel

Changed in linux (Ubuntu):
importance: Undecided → Medium
status: Confirmed → Triaged
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Also, you mention this is a regression. Does the 3.0.0-13-generic kernel boot in a reasonable amount of time? If not, can you post the last kernel that was booting properly?

tags: added: kernel-da-key
Revision history for this message
Sergio Callegari (callegar) wrote :

3.0.0-13 is the latest kernel that is booting fine.

3.0.0-14 does not, nor does 3.2RC4

However I am not extremely comfortable at opening an upstream kernel bug. The following points are unclear to me:

1) I have a strong suspicion that the issue I am running into is caused by a subtle timing issue. This suspicion is supported by the fact that other machines that I have, with a very similar configuration boot just fine. Maybe it is not that -14 is buggy, but only that it does something differently enough from -13 to alter timings, so that two events that are very closely spaced swap their order

2) It is unclear to me whether the bug is in the kernel, in udev or in lvm2. It looks like the timing relationships between the 3 is still object of investigation in Bug #802626. Specifically, it appears to me that debian derivatives run udev rules regarding lvm through watershed which is possibly a situation different from other distros.

3) It is unclear to me which upstream kernel version to report the bug against. Which upstream kernel does the ubuntu -14 correspond to?

Maybe some ubuntu kernel developer could have sounder grounds to opening an upstream bug than me.

Revision history for this message
Sergio Callegari (callegar) wrote :

In any case, if after thinking about the above points you still think that I should open an upstram bug, I'll be glad to to so.

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in lvm2 (Ubuntu):
status: New → Confirmed
Changed in udev (Ubuntu):
status: New → Confirmed
Revision history for this message
Paul Nooan (paul-noonan) wrote :

It looks like I hit this problem too, although perhaps I didn't leave it long enough to see if it would boot. Just assumed with no disk activity it wasn't going to boot.
In recovery mode it complained about a problem with root. Note root is on lvm2 and I'm running 64 Bit

3.0.0-13 works fine.

Revision history for this message
Sergio Callegari (callegar) wrote :

Can those who hit the bug please check if the patch in #802626 fixes the issue?

(basicaly it consists in adding a --noudevsync parameter to the vgchange command in
/lib/udev/rules.d/85-lvm2.rules. Then you need to regenerate the initramfs with
update-initramfs -u)

If it does we can probably mark this bug as a duplicate of that one.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

@Sergio, does the patch in 802626 resolve your issue?

tags: added: regression-update
Revision history for this message
Sergio Callegari (callegar) wrote :

Yes, as mentioned in #6, adding the --noudevsync parameter fixes the delay for me.

Now, I wonder if it is just me, or also others have the same experience.

IMHO, this is important to know because the bug title in 802626 is about those who have the root filesystem outside LVM and all the rest in LVM, i.e. a rather restrictive case, but the proposed fix (at least for me) corrects the boot delay also in a more general cases (I have the root fs in LVM).

If it fixes the bug for everybody, we can mark this bug as a duplicate and widen the title of 802626.

In the meantime, note that while the --noudevsync fixes the delay, I do not know if it may have nasty side effects on some configurations. I do not have a good enough knowledge of udev, but for sure the man page of lvm says that --noudevsync should be used with care.

Revision history for this message
Sergio Callegari (callegar) wrote :

Another little note.

I made a lot of testing in the last days.

I have a feeling that in extremely rare cirumstances also 3.0.0-13 hangs at boot (happened once in about 100 test boots)
Only, on my machine 3.0.0-13 hangs 1% of the boots (which goes unnoticed), while -14 does systematically.

Revision history for this message
Jason (jasonxh) wrote :

I have all my partitions on lvm except boot. I can confirm --noudevsync does work around the problem.

tags: added: kernel-key
Revision history for this message
James Keber (james-keber) wrote :

I've also run into this bug.

3.0.0-14 introduced a - precisely - 60 second delay into my boot-time. Tried 3.0.0-15 from proposed, with the same result. The only way to alleviate it was to roll-back to 3.0.0-13 (last kernel to not exhibit this behaviour). I've also tried a 3.2 kernel from Pangolin, and this too displays the delay.

I have my entire root file-system on LVM Including /boot), on GTP on an SSD.

I've now tried the "--noudevsync" fix and can confirm it removes the 60-second boot-delay when running 3.0.0-14 (and presumably all later kernels).

I'm interested in comment #2, which notes that this issue may be t related to:

# Stop udevd, we'll miss a few events while we run init, but we catch up
udevadm control --timeout=61 --exit

...as this duration exactly matches the delay I experience.

I had registered a bug about this (909805), but I've now marked it as a duplicate of this bug.

Revision history for this message
Wladimir Mutel (mwg) wrote :

I have a testing USB flash drive with bootable Ubuntu Oneiric on it.
I boot it with debug=y in kernel command line which causes init scripts to output execution debug.info right to the console
I boot with either 3.0.0.-13 and 3.0.0-14 and find that noth of them hang exactly on run-init executing real init on the root file system.
This run-init thing is absolutely undebuggable from the command line. I could run strace or ltrace over it but they are dynamic executables which require ld-linux.so to be mounted and active before their run.
What else should I do in this situation, to go without excessive complications in startup troubleshooting ?

Revision history for this message
Wladimir Mutel (mwg) wrote :

And yes, I tried to remove lvm2 & mdadm and regenerate initramfs,
this did not change anything.

Revision history for this message
Bluestone (jupitercuso4) wrote :

Yes, I have exactly the same problem on my Thinkpad X61.

Revision history for this message
Roman Tataurov (tataurov) wrote :

I have this bug too with 3.2 kernels.
Acer Aspire TimelineX 5820TG
Ubuntu 11.10
All my partitions on lvm .

Revision history for this message
Roman Tataurov (tataurov) wrote :

I can confirm --noudevsync does work around the problem. Just tested it on latest 3.2.1 kernel

Revision history for this message
James Keber (james-keber) wrote :

Given that the "--noudevsync" work-around from #802626 also appears to reliably fix this bug, I'm marking this bug as a duplicate of #802626.

Revision history for this message
Sergio Callegari (callegar) wrote :

Reopened the bug by removing the duplicate status, following the comment on bug 802626:

James,

I don't believe that's true. If the rootfs is in the same VG, there should be no possibility of udev being stopped in the initramfs prior to the dependent event making its way through the system, because without those events the rootfs can't be mounted at all. And certainly, root-on-LVM has been 100% reliable for me here. I think you should file a separate bug for the issue you're seeing.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Possible dups: bug 906358 and bug 631795

Changed in linux (Ubuntu):
importance: Medium → High
Revision history for this message
Alexandr Granin (graninas) wrote :

I caught the same bug, but I don't use any LVM. System hangs on boot when I use -15 kernel (generic and manually recompiled). -13 kernel works fine. I can't add --noudevsync to any line because there are no appropriate.

Laptop: Acer Aspire 5739G.

tags: removed: kernel-key
tags: removed: kernel-da-key
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

removing as a dup only temorarily.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.