Unable to use "virsh migrate" on two hosts after moving to raring

Bug #1157626 reported by David
18
This bug affects 2 people
Affects Status Importance Assigned to Milestone
libvirt (Ubuntu)
Fix Released
High
Unassigned

Bug Description

I upgraded two hosts from 12.10 to 13.04 for testing libvirt and qemu-kvm. After this they are unable to perform live migrations either shared storage (NFSv3) or local storage. These migrations did work for 12.10.
The hosts were using openvswitch for networking however I moved this back to simple bridging (via bridge-utils) before reporting the bug (still occurs). The VM is a fresh 12.10 server install, however I can reproduce this on older existing VMs of several different operating systems.

Shared storage command used:
virsh migrate --live --verbose ubuntu1 qemu+ssh://10.150.0.51/system
Local storage command used:
virsh migrate --live --copy-storage-all --verbose ubuntu1 qemu+ssh://10.150.0.51/system

In each case the above command returns:
error: operation failed: migration job: unexpectedly failed

I have also tried this with the full DNS hostname not just the IP addresses in the commands. Both hosts have /etc/hosts correctly setup and DNS is working fine forward and reverse for both. There were no changes to IP addresses or DNS after the upgrade from 12.10 to 13.04.

The VM also remounts root RO (linux guests) or panics as a filesystem vanishes (BSD guests). Rarely this doesn't occur on the first attempt.

In libvirtd.log the sending host reports:

2013-03-20 09:28:42.121+0000: 1652: error : qemuMigrationUpdateJobStatus:1219 : operation failed: migration job: unexpectedly failed

The receiving host in libvirtd.log reports:

2013-03-20 09:28:42.095+0000: 1889: error : qemuMonitorIO:602 : internal error End of file from monitor

The receiving host in the qemu vm log reports:

qemu: warning: error while loading state section id 2
load of migration failed

I have attached the vm config.

Tags: raring

Related branches

Revision history for this message
David (mardraum) wrote :
Revision history for this message
David (mardraum) wrote :

I should add a friend who uses arch linux gets the same error after upgrading to the libvirt and qemu-kvm available there as of this week.

Changed in libvirt (Ubuntu):
importance: Undecided → High
Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

When I do

virsh migrate --live --copy-storage-all --verbose vm2 qemu+ssh://root@10.42.43.1/system

with loglevel 1, libvirtd.log gives:

2013-03-20 17:17:28.386+0000: 28608: warning : virAuditSend:135 : Failed to send audit message virt
=kvm resrc=disk reason=start vm="vm2" uuid=1984e56c-0217-563c-fb23-b344053b385e old-disk="?" new-di
sk="/var/lib/libvirt/images/vm2.img": Operation not permitted
2013-03-20 17:17:28.386+0000: 28608: warning : virAuditSend:135 : Failed to send audit message virt
=kvm resrc=net reason=start vm="vm2" uuid=1984e56c-0217-563c-fb23-b344053b385e old-net=? new-net=52
:54:00:4E:72:E9: Operation not permitted
2013-03-20 17:17:28.387+0000: 28608: warning : virAuditSend:135 : Failed to send audit message virt
=kvm resrc=mem reason=start vm="vm2" uuid=1984e56c-0217-563c-fb23-b344053b385e old-mem=0 new-mem=10
48576: Operation not permitted
2013-03-20 17:17:28.387+0000: 28608: warning : virAuditSend:135 : Failed to send audit message virt
=kvm resrc=vcpu reason=start vm="vm2" uuid=1984e56c-0217-563c-fb23-b344053b385e old-vcpu=0 new-vcpu
=2: Operation not permitted
2013-03-20 17:17:28.387+0000: 28608: warning : virAuditSend:135 : Failed to send audit message virt
=kvm op=start reason=migrated vm="vm2" uuid=1984e56c-0217-563c-fb23-b344053b385e vm-pid=-1: Operati
on not permitted

Changed in libvirt (Ubuntu):
status: New → Triaged
Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

however disabling apparmor doesn't help it succeed.

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Running with shared storage, I see

2013-03-20 17:36:40.328+0000: 1189: warning : virAuditSend:135 : Failed to send audit message virt=kvm resrc=cgroup reason=allow vm="vm2" uuid=1984e56c-0217-563c-fb23-b344053b385e cgroup="/sys/fs/cgroup/devices/libvirt/qemu/vm2/" class=major category=pty maj=88 acl=rw: Operation not permitted

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

2013-03-20 17:36:40.328+0000: 1189: debug : virCgroupSetValueStr:331 : Set value '/sys/fs/cgroup/de
vices/libvirt/qemu/vm2/devices.allow' to 'c 136:* rw'
2013-03-20 17:36:40.328+0000: 1189: debug : virFileClose:72 : Closed fd 24
2013-03-20 17:36:40.328+0000: 1189: warning : virAuditSend:135 : Failed to send audit message virt=kvm resrc=cgroup reason=allow vm="vm2" uuid=1984e56c-0217-563c-fb23-b344053b385e cgroup="/sys/fs/cgroup/devices/libvirt/qemu/vm2/" class=major category=pty maj=88 acl=rw: Operation not permitted
2013-03-20 17:36:40.328+0000: 1189: debug : virCgroupSetValueStr:331 : Set value '/sys/fs/cgroup/devices/libvirt/qemu/vm2/devices.allow' to 'c 1:3 rw'
2013-03-20 17:36:40.328+0000: 1189: debug : virFileClose:72 : Closed fd 24
2013-03-20 17:36:40.328+0000: 1189: warning : virAuditSend:135 : Failed to send audit message virt=kvm resrc=cgroup reason=allow vm="vm2" uuid=1984e56c-0217-563c-fb23-b344053b385e cgroup="/sys/fs/cgroup/devices/libvirt/qemu/vm2/" class=path path=/dev/null rdev=01:03 acl=rw: Operation not permitted
2013-03-20 17:36:40.328+0000: 1189: debug : virCgroupSetValueStr:331 : Set value '/sys/fs/cgroup/devices/libvirt/qemu/vm2/devices.allow' to 'c 1:7 rw'

Looks like something is preventing us from setting our devices cgroups

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :
Download full text (4.0 KiB)

It looks like /sys/fs/cgroup/*/libvirt/qemu/$vm needs to be added to libvirt-qemu.

Also, /dev/ptmx is now a symlink or bind mount to /dev/pts/ptmx. /etc/appamor.d/abstractions/libvirt-qemu lists /dev/ptmx, but needs to also list /dev/pts/ptmx.

After adding /sys/fs/cgroup/** rw and /dev/pts/ptmx rw to /etc/apparmor.d/abstractions/libvirt-qemu, I get what appear to be libvirt-qemu incompatibilities:

2013-03-20 17:50:50.759+0000: 5788: debug : qemuMonitorJSONIOProcessLine:170 : QEMU_MONITOR_RECV_REPLY: mon=0x7f0414007f50 reply={"return": [{"name": "SPICE_MIGRATE_COMPLETED"}, {"name": "BALLOON_CHANGE"}, {"name": "WAKEUP"}, {"name": "SUSPEND_DISK"}, {"name": "SUSPEND"}, {"name": "DEVICE_TRAY_MOVED"}, {"name": "BLOCK_JOB_READY"}, {"name": "BLOCK_JOB_ERROR"}, {"name": "BLOCK_JOB_CANCELLED"}, {"name": "BLOCK_JOB_COMPLETED"}, {"name": "SPICE_DISCONNECTED"}, {"name": "SPICE_INITIALIZED"}, {"name": "SPICE_CONNECTED"}, {"name": "WATCHDOG"}, {"name": "RTC_CHANGE"}, {"name": "BLOCK_IO_ERROR"}, {"name": "VNC_DISCONNECTED"}, {"name": "VNC_INITIALIZED"}, {"name": "VNC_CONNECTED"}, {"name": "RESUME"}, {"name": "STOP"}, {"name": "POWERDOWN"}, {"name": "RESET"}, {"name": "SHUTDOWN"}], "id": "libvirt-5"}
2013-03-20 17:50:50.798+0000: 5788: debug : qemuMonitorIOProcess:342 : QEMU_MONITOR_IO_PROCESS: mon=0x7f0414007f50 buf={"id": "libvirt-9", "error": {"class": "DeviceNotFound", "desc": "Device 'virtio-blk-s390' not found"}}
2013-03-20 17:50:50.798+0000: 5788: debug : qemuMonitorJSONIOProcessLine:150 : Line [{"id": "libvirt-9", "error": {"class": "DeviceNotFound", "desc": "Device 'virtio-blk-s390' not found"}}]
2013-03-20 17:50:50.798+0000: 5788: debug : virJSONValueFromString:944 : string={"id": "libvirt-9", "error": {"class": "DeviceNotFound", "desc": "Device 'virtio-blk-s390' not found"}}
2013-03-20 17:50:50.799+0000: 5788: debug : qemuMonitorJSONIOProcessLine:170 : QEMU_MONITOR_RECV_REPLY: mon=0x7f0414007f50 reply={"id": "libvirt-9", "error": {"class": "DeviceNotFound", "desc": "Device 'virtio-blk-s390' not found"}}
2013-03-20 17:50:50.802+0000: 5788: debug : qemuMonitorIOProcess:342 : QEMU_MONITOR_IO_PROCESS: mon=0x7f0414007f50 buf={"id": "libvirt-10", "error": {"class": "DeviceNotFound", "desc": "Device 'virtio-net-s390' not found"}}
2013-03-20 17:50:50.802+0000: 5788: debug : qemuMonitorJSONIOProcessLine:150 : Line [{"id": "libvirt-10", "error": {"class": "DeviceNotFound", "desc": "Device 'virtio-net-s390' not found"}}]
2013-03-20 17:50:50.802+0000: 5788: debug : virJSONValueFromString:944 : string={"id": "libvirt-10", "error": {"class": "DeviceNotFound", "desc": "Device 'virtio-net-s390' not found"}}
2013-03-20 17:50:50.802+0000: 5788: debug : qemuMonitorJSONIOProcessLine:170 : QEMU_MONITOR_RECV_REPLY: mon=0x7f0414007f50 reply={"id": "libvirt-10", "error": {"class": "DeviceNotFound", "desc": "Device 'virtio-net-s390' not found"}}
2013-03-20 17:50:50.805+0000: 5788: debug : qemuMonitorIOProcess:342 : QEMU_MONITOR_IO_PROCESS: mon=0x7f0414007f50 buf={"id": "libvirt-11", "error": {"class": "DeviceNotFound", "desc": "Device 'pci-assign' not found"}}
2013-03-20 17:50:50.805+0000: 5788: debug : qemuMonitorJSONIOProcessLine:150 : Line [{"id": "libvirt-11", "er...

Read more...

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Looks like the errors in the last comment were not meaningful. The actual problem was that although i specified the ip address, later on virsh migrated wanted a hostname tied to that ip address. After doing that, finally got the bug submitter's original problem:

qemu: warning: error while loading state section id 2

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Manually migrating in qemu seems to work ok. Need to figure out whether one of the objects being migrated by libvirt is not working right, or if libvirt is setting something up wrong.

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Actually today manual migration in kvm is not working for me.

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

I've opened bug 116963 to track the qemu bug. I suspect when the qemu bug is fixed, we'll still need an apparmor fix or two in libvirt to complete solve this bug.

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

This is not a bug in libvirt after all. When I fix migration.c to reetry on -EAGAIN as well as -EINTR when flushing a buffer, the migration completes. However, the resulting VM on the other end is corrupt. I can ssh in, but eventually ls of any data not yet in the buffer cache results in IO error, and eventual crash.

The code which I had to patch has been removed in latest upstream git HEAD, investigating that...

Changed in qemu (Ubuntu):
status: New → Triaged
importance: Undecided → High
Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Oops, corruption was due to mismatched libvirtd groups with shared media stored over nfs.

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Using upstream git head fails a different way, for the record.

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

A test package with my proposed fix is building in ppa:serge-hallyn/virt
(qemu_1.4.0+dfsg-1expubuntu5~ppa1).

Revision history for this message
David (mardraum) wrote :
Download full text (8.2 KiB)

I updated both test hosts to latest raring, then added the ppa, upgraded, and confirmed with dpkg all "qemu-" packages on both hosts are "1.4.0+dfsg-1expubuntu5~ppa1". I then started my test VM, tried to migrate (shared storage) and still have the same problem.

Sending host libvirtd.log:

2013-04-03 12:25:10.562+0000: 1665: error : qemuMigrationUpdateJobStatus:1219 : operation failed: migration job: unexpectedly failed
2013-04-03 12:26:06.786+0000: 1654: error : qemuMonitorIO:602 : internal error End of file from monitor
2013-04-03 12:26:07.338+0000: 1654: warning : virAuditSend:135 : Failed to send audit message virt=kvm op=stop reason=shutdown vm="ubuntu1" uuid=977b8f97-4db2-c54d-c269-13c448d11b8c vm-pid=-1: Operation not permitted

Receiving host libvirtd.log:

2013-04-03 12:25:10.143+0000: 2768: warning : virAuditSend:135 : Failed to send audit message virt=kvm vm="ubuntu1" uuid=977b8f97-4db2-c54d-c269-13c448d11b8c vm-ctx=libvirt-977b8f97-4db2-c54d-c269-13c448d11b8c img-ctx=libvirt-977b8f97-4db2-c54d-c269-13c448d11b8c model=apparmor: Operation not permitted
2013-04-03 12:25:10.143+0000: 2768: warning : virAuditSend:135 : Failed to send audit message virt=kvm vm="ubuntu1" uuid=977b8f97-4db2-c54d-c269-13c448d11b8c vm-ctx=106:106 img-ctx=106:106 model=dac: Operation not permitted
2013-04-03 12:25:10.143+0000: 2768: warning : virAuditSend:135 : Failed to send audit message virt=kvm resrc=cgroup reason=deny vm="ubuntu1" uuid=977b8f97-4db2-c54d-c269-13c448d11b8c cgroup="/sys/fs/cgroup/devices/libvirt/qemu/ubuntu1/" class=all: Operation not permitted
2013-04-03 12:25:10.144+0000: 2768: warning : virAuditSend:135 : Failed to send audit message virt=kvm resrc=cgroup reason=allow vm="ubuntu1" uuid=977b8f97-4db2-c54d-c269-13c448d11b8c cgroup="/sys/fs/cgroup/devices/libvirt/qemu/ubuntu1/" class=major category=pty maj=88 acl=rw: Operation not permitted
2013-04-03 12:25:10.144+0000: 2768: warning : virAuditSend:135 : Failed to send audit message virt=kvm resrc=cgroup reason=allow vm="ubuntu1" uuid=977b8f97-4db2-c54d-c269-13c448d11b8c cgroup="/sys/fs/cgroup/devices/libvirt/qemu/ubuntu1/" class=path path=/dev/null rdev=01:03 acl=rw: Operation not permitted
2013-04-03 12:25:10.144+0000: 2768: warning : virAuditSend:135 : Failed to send audit message virt=kvm resrc=cgroup reason=allow vm="ubuntu1" uuid=977b8f97-4db2-c54d-c269-13c448d11b8c cgroup="/sys/fs/cgroup/devices/libvirt/qemu/ubuntu1/" class=path path=/dev/full rdev=01:07 acl=rw: Operation not permitted
2013-04-03 12:25:10.144+0000: 2768: warning : virAuditSend:135 : Failed to send audit message virt=kvm resrc=cgroup reason=allow vm="ubuntu1" uuid=977b8f97-4db2-c54d-c269-13c448d11b8c cgroup="/sys/fs/cgroup/devices/libvirt/qemu/ubuntu1/" class=path path=/dev/zero rdev=01:05 acl=rw: Operation not permitted
2013-04-03 12:25:10.144+0000: 2768: warning : virAuditSend:135 : Failed to send audit message virt=kvm resrc=cgroup reason=allow vm="ubuntu1" uuid=977b8f97-4db2-c54d-c269-13c448d11b8c cgroup="/sys/fs/cgroup/devices/libvirt/qemu/ubuntu1/" class=path path=/dev/random rdev=01:08 acl=rw: Operation not permitted
2013-04-03 12:25:10.144+0000: 2768: warning : virAuditSend:135 : Failed to send audit messag...

Read more...

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :
Revision history for this message
Serge Hallyn (serge-hallyn) wrote : Re: [Bug 1157626] Re: Unable to use "virsh migrate" on two hosts after moving to raring

Quoting David (<email address hidden>):
> I updated both test hosts to latest raring, then added the ppa,
> upgraded, and confirmed with dpkg all "qemu-" packages on both hosts are
> "1.4.0+dfsg-1expubuntu5~ppa1". I then started my test VM, tried to
> migrate (shared storage) and still have the same problem.

Sorry - I had a typo in the patch in the package! I didn't test until
this morning when I noticed it wasn't working.

A new package (~ppa2) is building now.

Revision history for this message
David (mardraum) wrote :

I can confirm that using the qemu- ppa2 packages, I can migrate the test VM successfully to and fro.

A friend using arch tested the qemu fix successfully too. He then reverted the qemu fix and tried with libvirt 1.0.4 and his migrations were still successful. Are there bugs and workarounds on both sides? Will raring get libvirt 1.0.4?

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Quoting David (<email address hidden>):
> I can confirm that using the qemu- ppa2 packages, I can migrate the test
> VM successfully to and fro.

Afterward, when you log in and run a binary which you hadn't previously
run (to force loading it from disk), does it work? I'm getting memory
corruption at that point

serge@vm2:~$ id
-bash: /usr/bin/id: Input/output error

but I'm still trying to figure out if it's a result of these rather
different machiens and setup.

> A friend using arch tested the qemu fix successfully too. He then
> reverted the qemu fix and tried with libvirt 1.0.4 and his migrations
> were still successful. Are there bugs and workarounds on both sides?
> Will raring get libvirt 1.0.4?

That wasn't the plan, but it may be worthwhile if it solves this without
a hack. Thanks for that info!

Revision history for this message
David (mardraum) wrote :

My VM works fine for any binary I can think of offhand, login in, out, migrating, login, try new binary, checking for updates etc no problems.

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

A full backport of the 1.4.1-stable backport is available in ppa:serge-hallyn/virt - it fixes this on its own. We also are getting ready to push libvirt 1.0.4, which also should fix this. (meaning one or the other should not be needed, but both carry other fixes)

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package libvirt - 1.0.2-0ubuntu11

---------------
libvirt (1.0.2-0ubuntu11) raring; urgency=low

  * debian/patches/nonblock-fix.patch: cherrypicked upstream patch to
    not mark qemu migration fd non-blocking. This fixes tcp live
    migration. (LP: #1157626)
 -- Serge Hallyn <email address hidden> Thu, 18 Apr 2013 10:43:26 -0500

Changed in libvirt (Ubuntu):
status: Triaged → Fix Released
Revision history for this message
Uzzi (andreaussi-yahoo) wrote :

I'm using libvirt-bin 1.0.2-0ubuntu11.13.04.2, quemu-kvm 1.4.0+dfsg-1expubuntu4 - qemu-system-x86 (2 1.4.0+dfsg-1expubuntu4)
on ubuntu 13.04 ancd I've the same problem:

2013-07-24 08:50:36.748+0000: 12330: error : qemuMonitorIO:602 : internal error End of file from monitor
2013-07-24 08:50:37.281+0000: 12330: warning : virAuditSend:135 : Failed to send audit message virt=kvm op=stop reason=shutdown vm="DBhost" uuid=51051106-6c05-479d-59d6-47ec20ce4d86 vm-pid=-1: Operation not permitted
2013-07-24 08:50:48.293+0000: 12330: error : qemuMonitorIO:602 : internal error End of file from monitor
2013-07-24 08:50:48.584+0000: 12330: warning : virAuditSend:135 : Failed to send audit message virt=kvm op=stop reason=shutdown vm="SpagoBI4src" uuid=5a70f9b1-dc42-528b-e64d-10f4f3800c77 vm-pid=-1: Operation not permitted

Revision history for this message
Uzzi (andreaussi-yahoo) wrote :

My system works after editing /etc/libvirt/qemu.conf and set this:
user = "libvirt-qemu"
group = "kvm"

Revision history for this message
André Bauer (monotek) wrote :

same here after update from 12.04 to 13.10 :-(

no longer affects: qemu (Ubuntu)
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.